Skip to main content

Theory and Method for the Statistical Investigation of Multimodal Promotional Practices in the Digital Era: A Data-driven Approach Based on Systemic Functional Linguistics and Social Semiotics

Published onJan 25, 2023
Theory and Method for the Statistical Investigation of Multimodal Promotional Practices in the Digital Era: A Data-driven Approach Based on Systemic Functional Linguistics and Social Semiotics

1. Introduction

One of the major challenges the humanities have faced ever since the last decades of the twentieth century is the adoption of quantitative measures in the analysis of social phenomena in order to draw reliable, reproducible results and determine their implications like many academic fields in the hard sciences (Sayers; Krippendorff 215). To do this, however, requires a set of theoretical, analytical, and technological tools as well as methodological frameworks that still need to be created or adapted to the needs of humanities research.

Within the area of linguistic investigation, consistent efforts have been made to provide replicable results and reliable interpretations; indeed, corpus analysis—the analysis of large collections of data which are more representative samples of real linguistic exchanges—has revolutionized the way in which language is examined from a scientific point of view and enabled  the discovery of patterns of language use and novel understanding of human society that could not be accessed before (McEnery and Hardie 1; Taylor and Marchi 2–6).

The analysis of discourse and meaning-making processes, in particular, has benefitted from these advances in linguistics research and commitment to representative findings and has found new ways of approaching asymmetrical communication and examining power relations, reaching a stage in which other areas of communication, such as social semiotics (the use of signs in social settings), have become involved (Kress and van Leeuwen). As linguistic resources are increasingly combined with other means of communication to provide a particular representation of facts and shape biased viewpoints, there is an urgent need for analysis of how meaning interconnections are designed intentionally by specialists and institutions—as well as individuals—to convey particular attitudes and beliefs in non-transparent ways and for persuasive purposes.

The two strands of recent research on language mentioned above—corpus linguistics and social semiotics—are nowadays converging to offer tools and methods for the investigation of large samples of language and human communication, which often involve the interaction of multimodal resources such as image and text, through the lens of Systemic Functional Linguistics (Halliday and Matthiessen). Systemic Functional Linguistics (SFL) is a theory of language that considers humans as active designers of meaning through intentional linguistic and semiotic choices which have a particular function and are affected by the socio-cultural and technological context, thus creating practices and interests that are shared and perpetrated between members of communities. In particular, the functions or metafunctions of language act as mediators between context and actual linguistic use by providing information on the discourse semantics and the nature of a situated communicative event. In concrete terms, they supply individuals with concrete, systematic meaning-making resources to (1) shape a particular representation of events and participants, (2) convey particular emotions and stances, and (3) confer coherence and cohesion to the overall message delivered.

In the field of tourism discourse, recent research has led to fascinating discussions about the approach to communication which considers contemporary, image-centric media to play a significant role in leveraging socially driven search of individuals for what is extraordinary, or different from ordinary constraints and work routines (Stöckl et. al., Shifts; Debord). However, current methods often include either the extensive analysis of text corpora without a systematic, multimodal analysis, or a systemic functional linguistics approach (Maci; Manca; Bianchi), or a pioneering, multimodal analysis of small samples of image-text relations (Francesconi).

Few scholars have advocated the establishment of a line of research that attempts to study the interactions of images and texts from a quantitative perspective (O’Halloran; Bateman, Text), and impressive results have been achieved in the realm of moving images, movies, and comics and document layouts (Wildfeuer and Bateman; Bateman et al., “An Open”; Bateman, Multimodality), or in the realm of static imagery in media and political discourse (Stöckl et al., Shifts; Caple et al.; Caple). Notwithstanding, no corpus-based studies of still imagery and text have been so far conducted in tourism discourse on social media such as Instagram.

This paper attempts to fill this gap in the literature by providing an overview of the research design, corpora, and methodological tools developed to investigate multimodal online tourism discourse in an ongoing digital humanities research project. The overall aim of this study is to analyze the interconnections between static imagery and written text on the official websites and Instagram accounts of three popular tourist boards through the lens of SFL and using a multimodal, mixed methods approach. In other words, the project presented in this paper attempts to explore how the three metafunctions of this linguistic theory are realized in interconnected corpora of promotional textual and visual materials, and to investigate if their meaning varies according to their respective online channels. Therefore, its focus is twofold.

The first is the analysis of the multimodal choices of visual and textual strategies for the design of an imaginary, manipulated tourist experience (ideational metafunction) and of an asymmetrical, profit-driven relationship between the producer of the message and the recipient in terms of conveying positive attitudes, exclusivity and persuasion (interpersonal metafunction) in an overall coherent orchestration (textual metafunction). The textual metafunction is concerned specifically with the disposition of information within each semiotic resource and the degree of interdependence of semiotic resources in the socio-communicative act.

The second is the discovery of differences in the frequency and use of visual and textual strategies specific to a channel to achieve particular rhetorical aims in the marketing funnel of persuasion or AIDA model (Manca). According to this model, different media and communicative strategies may be persuasive to prospective customers depending on their progress in the virtual journey towards purchase, which starts with the attraction of attention through visibility and awareness (A), moves to the stimulation of interest (I) and desire (D), and concludes with call to actions (A). The examination of these meaning-making processes according to each channel’s audience and persuasive function may lead to the identification of a new genre of tourism communication on Instagram (Bateman, “Genre”; Francesconi 12–20).

This paper, in particular, will outline the methodological framework which enabled this project to be conducted and contributed to the achievement of the research objectives. To this aim, section 2 will present the corpus design and the theoretical tool devised for the comparison of text and image in tourism discourse. Section 3 will focus on a brief description of the analytical tool developed for the systematic manual tagging of the visual corpora, together with the statistical measures adopted to quantify the occurrence and variance of specific variables across online corpora. Section 4 will introduce the software designed to carry out the tagging procedure, and finally, Section 5 will state the current advancements in this research and the future developments. Due to space constraints, the results achieved so far will not be discussed thoroughly in this paper.

Potentially, the three tools and the overall methodological procedure presented in this paper may be useful to scholars in other fields of the humanities who wish to identify and quantify the occurrence of and relations between communication strategies realized through different semiotic resources. This, in turn, may provide the data necessary to compare—by means of statistical measurement—the frequency, distribution, and situated use of multisemiotic resources across a variety of contexts including digital media, genres, and discourses, consequently helping unveil actual patterns of realization of different communicative purposes.


2. Corpus Design and the Theoretical Tool

To gain a faithful understanding of contemporary, digital English tourism narratives, three popular tourist boards based in English-speaking countries were selected as the objects of this study. In particular, multimodal data consisting of imagery and written text was retrieved from their two main channels of communication, i.e., Instagram, the social media platform for brand awareness through content sharing, and their official website, the main source of their profit since it provides booking opportunities.

The six multimodal sub-corpora of promotional communication were collected in 2019 through Instagram’s application programming interface (API; six months of Instagram posts for each tourist board), web-scraping, and a web-crawling procedure offered by the text-mining software Sketch Engine (Kilgarriff et. al.). All sub-corpora of texts and images were cleaned, uploaded using different software, and statistically measured for quantitative and qualitative analysis. In particular, instances of repeated materials were searched for in the textual sub-corpora, first automatically and then manually in the .txt file using regular expressions. While the corpora of language were uploaded and analyzed using the Sketch Engine tools, the visual corpora were uploaded and tagged manually using newly developed software which will be described in section 4. The composition of the visual sub-corpora according to the tourist board (Tourism Ireland, Destination Canada, and Tourism Western Australia) and their channel is provided in Table 1, whereas the Instagram and website sub-corpora of written language are shown in tables 2–4. In particular, Table 1 shows the number of images which have been collected across tourist boards and related channels and whose strategies have been tagged and measured statistically. Tables 2–4 show the composition of the six sub-corpora of written language grouped according to channels and with information regarding the number of tokens, which are the smallest units in a corpus, such as words, digits, punctuation; the number of words, or unique items in a corpus; the types/token ratio (TTR);1 and the composition of the channel corpora in terms of tokens (%) per each sub-corpus.

Table 1: Subdivision of Visual Sub-Corpora According to Tourist Board and Channel

Tourist Board

Number of Images (Instagram Posts)

N. of Images (Official Websites)

N. of Total Images per Tourist Board

Tourism Ireland




Destination Canada




Tourism Western Australia








 Sources: Tourism Ireland, Ireland, “Posts”; Destination Canada, Keep Exploring, “Posts”; Tourism Western Australia, Western Australia, “Posts”


Table 2: Subdivision of Written Sub-Corpora According to Tourist Board on the Official Website

Website Corpus



% (Tokens)

Types/Token Ratio (TTR)

Tourism Ireland





Destination Canada





Tourism Western Australia





Sources:  Tourism Ireland, Ireland; Destination Canada, Keep Exploring; Tourism Western Australia, Western Australia


Table 3: Subdivision of Written Sub-Corpora According to Tourist Board on Instagram

Instagram Corpus



% (Tokens)

Types/ Token Ratio (TTR)

Tourism Ireland (@tourismireland)





Destination Canada (@explorecanada)





Tourism Western Australia (@westernaustralia)





Sources:  Tourism Ireland, “Posts”; Destination Canada, “Posts”; Tourism Western Australia, “Posts”


Table 4: Subdivision of Written Sub-Corpora According to Channel


Total Tokens for Channel

Total Words per Channel

Types/Token Ratio (TTR)









Sources: Tourism Ireland, Ireland, “Posts”; Destination Canada, Keep Exploring, “Posts”; Tourism Western Australia, Western Australia, “Posts”


In order to analyze the visual and written strategies in the corpora and connect the findings, two tools were necessary. First, a detailed table that would describe the different layers of analysis and comparison between semiotic resources—text and images—(Kress and Leeuwen) according to a multimodal communicative framework. Second, an analytical tool for the quantitative investigation of visual corpora and quantification of the features through statistical measurement.

Indeed, as text and discourse are inspected through the quantification and measurement of word occurrences by means of “wordlists,” “keyword lists” (McEnery and Hardie 48–53), and choices, images too need to be investigated in terms of presence or absence of features—which are visual, semiotic ways of expression—to verify the existence and variance of visual strategies across images and, in this study, across channel corpora. The potential variance in the frequency and use of strategies across these corpora of text and images, which represent multimodal communication on different digital channels with particular communicative promotional aims, would then demonstrate the existence of a new Instagram genre of tourism discourse (Bateman, “Genre”).

The main challenges faced while developing these two tools concerned two aspects. The first is the quantification of visual strategies—which are built on a qualitative and abstract, empirically untested theory of social semiotics (Kress and Leeuwen) and derived from the SFL theory of language (Halliday and Matthiessen). In particular, Gunther Kress and Theo van Leeuwen’s approach explores how the three metafunctions of language are realized through specific strategies in other semiotic resources such as static imagery. In other words, it assists in the investigation of (1) the representation of participants and events, (2) the choice of camera shots and angles that shape a particular relationship with the viewer, and (3) the distribution and salience of the elements in the picture.

The second aspect is the adaptation of this social semiotic theory to the visual tourism discourse and genres, which could lead to the effective, reliable identification and measurement of features in hundreds of images and the possible understanding of the underlying meaning and intended message. Indeed, multimodal communication research needs mode- and context- specific empirical analysis or, in other words, “situated discourse interpretation of material patterns” (Bateman, “Towards” 535). While the first tool is introduced in this section together with a practical application, the second one is outlined in section 3, along with a brief description of the statistical tests conducted, including intercoder reliability tests.

In order to be able to investigate the corpora of tourism photography systematically, and to categorize and quantify the visual strategies that could then be connected to the linguistic findings of tourism written discourse, a multimodal comparative scheme was drafted (Table 5). This tool is an attempt to take into account not only the specific semiotic affordances, i.e., features, of each mode (or semiotic resource, i.e., text and image), but also the discourse specificities typical of tourism promotional and persuasive narratives (Dann; Urry) and the generic characterization and development according to the aim of the communication and the semiotic partition of meaning (Bateman, “Genre”; Maci 4–10; Bhatia 22–34, 30). In other words, this tool allows for the detection of variation among the corpora of channels in terms of multimodal meaning construction using the three metafunctions as a means of comparison. Indeed, by using the table as a theoretical support, it was possible to understand whether the intrasemiotic and intersemiotic patterns of variable occurrences varied in terms of frequency and use according to medium and revealed the existence of a specific persuasive genre of tourism photography and communication on Instagram (Francesconi 15–25, 29–39; Manca).

Table 5: Table for the Semiotic Comparison between Visual and Textual Features across the Three Metafunctions or Contextual Parameters

 Metafunctions/ Contextual Parameters (SFL)

 Linguistic Strategies

 Visual Strategies




Ideational –


 Transitivity analysis

Processes, participants, circumstances

Places, colours, feelings, activities, subjects.



  • Processes and participants

  • Mood/setting

  • Colour

  • Shot/perspective/angle

  • Tourism tropes



Interpersonal –



Appraisal theory

Affect, appreciation, force sub-systems.


  • Shot/angle

  • Participants

  • Colour

  • Indexical reference

  • Tourism tropes




Textual –



Tourism discourse devices, generic considerations




  • Space distribution/ composition

  • Colour

  • Tourism tropes


Sources: Halliday and Matthiessen; Martin and White; Kress and van Leeuwen; Dann; Bateman, “Genre”


The comparative scheme introduced in Table 5 builds on Halliday’s three metafunctions of language and contextual parameters, which support the realization of meaning in context. The first metafunction, ideational, serves the purpose of representing linguistically inner or outer experiences through the system of Transitivity—i.e., processes (verbs) and participants (subjects and direct objects). The second metafunction, interpersonal, concerns the negotiation of meaning and the use of lexicogrammatical resources to convey attitudes and evaluations about the reality represented through the Appraisal theory of evaluation (Martin and White; Thompson). In addition, the third metafunction, textual, focuses on cohesive ties between clauses in the creation of a text and was reinterpreted to include discursive and generic considerations according to channel and semiotic labour (Webster 35, 41; Halliday and Matthiessen 33–41; for an overview of the nature and role of professional photography on Instagram, see Manovich).

The visual strategies build on the Grammar of Visual Design (Kress and Leeuwen) of Systemic Functional Linguistics, and fulfill all three metafunctions through static tourism imagery. These strategies relate to the visual choices made by the discourse specialists in terms of (1) participants, events and settings selected, (2) the shots and angles adopted, and (3) their positioning as well as salience in the photograph. The latter’s categorization will be described more in detail in section 3, with the introduction of the analytical tool for the manual tagging of images.

In this study, the three metafunctions seemed to be realized through specific resources for promotional purposes. In particular, the contextual parameter field or ideational metafunction supports the construction, in tourism discourse, of a manipulated and filtered representation of reality, both visually and verbally. The tenor, on the other hand, helps design a narrative of persuasion through which highly evaluative and evocative vocabulary boosts an emotive and visual pre-consumption of the experience and influences the perception of a destination. This conveys positive attitudes and eventually affects the purchasing behaviour. Finally, the analysis of mode(s) and channels ties together all the findings and provides an understanding of how promotional narratives are realized on both channels (Instagram and website) and in both semiotic modes (textual and visual). More specifically, it includes the investigation of contextual discursive strategies such as tourism tropes as well as the semiotic centricity and weight of images and text. While tourism tropes are recurring themes of persuasion based on socially driven needs, and dictated by capitalist ideology (Dann), semiotic centricity and weight (or “dominance” and “centrality,” respectively, Stöckl et al., “Introduction” 7) explore what kinds of strategies and modes are paramount to the definition of the overall meaning conveyed by a genre of communicative events in a particular socio-semiotic and discursive context. In other words, the textual metafunction investigates the degree of pervasiveness, salience, and complexity in the design of practices according to semiotic mode, demonstrating either the centrality of a mode and the subordinate role of the others in the communicative event or their equal, complementary, and bidirectional status in the elaboration of meaning.

The focus on imagery in tourism discourse derives from the fact that the tourist gaze, i.e., the sum of the expectations of the tourist concerning the travel experience, is often and naturally created by the visual consumption (and imagination) of a product which is intangible by nature (Urry). Therefore, promotional communication about tourist destinations is inherently visual and is characterized by specific features which makes it appealing for a specific purpose. In particular, prospective tourists proved to be interested in experiencing and finding in the destination a shaped representation of reality, dictated mainly by a capitalist-driven need for representations of (1) what is different and unknown, uncontaminated and extraordinary compared to the chaotic, not industrialized and predictable (the strangerhood technique, described below); (2) events or situations in which there are no work responsibilities and time constraints or schedules (tense and play); and (3) authenticity and typicality of gastronomic products, traditional customs, and natural wildlife (Dann; Urry).

Therefore, the tourist driving forces just mentioned are recurring rhetorical devices adopted by specialists of the tourism industry to instill imagination and feelings such as a longing for the travelling experience, and they are designed through specific choices in language and imagery construction to achieve social control (Said; Dann 2, 79–84) and manipulate the consumer’s perspective and behaviour, thus generating profit (Dann 76–77). The authenticity and strangerhood tropes, for example, are realized through the representation of natural, off-the-beaten-track landscapes of typical environments or habits, thus fulfilling the tourist’s desire for a different, striking reality in terms of temporal continuity or local specificity (Dann 62–68; Maci 137, 177).

In order to show concretely how each one of the metafunctions is realized by the written and visual semiotic modes, thus contributing to the representation of the tourist gaze, a couple of multisemiotic artifacts were extracted from the corpora under investigation and considered as a sample analysis. While the post in Figure 1 belongs to the sub-corpus of Tourism Western Australia Instagram, Figure 2 is an excerpt from a blog post of the Destination Canada website (Keep Exploring).

Figure 1: An Instagram post shared by Tourism Western Australia (@westernaustralia) on 6 June 2019. Photo by @gypsylovinglight.

In Figure 1, the ideational meaning is realized linguistically through the inclusion of the participants “we” and “you,” alongside the elements referring to the natural setting such as “crystal clear tides,” or local, historical entities such as “dinosaurs” and “dinosaur footprint.” Visually speaking, these elements are partly reproduced through the depiction of a back-posing human participant surrounded by a coastal environment and gazing at an indefinite point outside the picture. This type of event is called a “non-transactional reaction process” which aims to shape perceptive and emotive expectations regarding the travel experience by attaching positive connotation to the destination in terms of aesthetics and imagination stimulation, i.e., of what can be seen or felt. The text enhances the persuasiveness of the visual message through the inclusion of metaphorical verbs of action such as “melt” (“at the sight of”), “discover,” “to be left mind-boggled,” “make” (“the ancient landmark worth the visit”), “dive into,” “set your [eyes] upon,” or through mental verbs of cognition such as “to know.” In particular, the denotative meaning of the verb “to make” – that of creation, performance – is intended here in a metaphorical sense as its subject (“this”) does not refer to a living entity, thus adding to the degree of extraordinariness of the experience promoted (Grammatical Metaphor). Finally, verbs such as “to be” accompanied by adjectives or nouns establish a relation between an entity and an attribute or identity (relational processes in SFL) and contribute to the overall promotional aim by describing the destination in terms of natural or historical uniqueness.

The interpersonal metafunction, in terms of language, may be realized and analyzed through the choice of adjectives (or pre-modifiers in general), verbs and nouns, including the type of emotion or attitude they convey. The ones listed in the previous paragraph attempt to push the recipient of the message to agree with the evaluation of the destination advanced by the discourse specialist by negotiating attitudes of evoked satisfaction (instant gratification for visiting mentally a place, a “hidden gem”), reaction (the generation of an internal, irrational response to an external stimulus by making an immediate impact on the perception of the reader, i.e., “magnificent,” “beautiful”). Visually speaking, asymmetrical evaluations and relationships are conveyed in terms of gaze, shots and angles. In particular, the absence of a gaze directed at the viewer, together with the dehumanized but still close position (medium shot) of the participant from a slightly higher angle, invite the consumer to relate to the participant and identify with them, and to imagine the visit to the place in the idyllic and untamed natural destination, with “crystal clear” waters and “weathered rocks.” The feeling of superiority delineated by the high angle wishes to put the viewer in an alleged position of power and control over the visual consumption of the experience and the main participant, which is a marketing strategy to attract the attention, stimulate desire and eventually lead to action (Francesconi).

Finally, the textual meaning is realized through professional design strategies of information distribution such as the rule of thirds or negative space (Manovich) that confer an aesthetically pleasant and attractive view, also through the semiotic partitioning of the labour assigned to each mode. In particular, on Instagram, the image plays a major role in meaning communication, whereas the text is subordinate and anchors the meaning already conveyed by the picture. On the website, on the other hand, the semiotic partitioning of meaning is reversed, as the image acts as a subordinate illustrator of the meaning represented by the written text. This is confirmed by the fact that images not only lack the aesthetically appealing, evoking design features with dehumanized static representations which are present in Figure 1 but they are also accompanied by informative text and practical suggestions. In Figure 2, visually speaking, the participants and the settings highlight other aspects typical of the destination (in this image a couple of friends talking on a pier, a camera, an industrialized city centre on the horizon, and the sea) and are represented through a medium shot and an eye-level angle that aim to establish a social and equal relationship with the main participants of the tourist experience depicted – namely, the smiling tourists engaged in activities. The focus on human and social traits such as the presence of the gaze and smile associated with the medium shot make the picture less evocative and more spontaneous. Indeed, its main focus is to accompany the text as an indexical reference, or evidence, and not to boost imagination. The linguistic resources are the most meaningful pieces of information as they convey positive attitudes, expectations regarding the experience through mental processes (such as “believe” and “look out”) and relational ones (with attributes such as “major beach city,” “one of the 10 best,” “popular site,” “comfort of some warm sand”), and offer a list of activities that underline the uniqueness of a destination through the quantity of options easily available. These textual characteristics, accompanied by the inclusion of industrialized settings and the subordinate or nonexistent role of evoking pictures, are some of the indicators of the different multimodal orchestration design across digital platforms, and of the opposite gaze they depict. Indeed, the website’s focus on other aspects of authenticity of the destination and not only natural and uncontaminated, together with a more informative language of abundance and popularity, indicate an interest in shaping a tourist gaze of collectivity and mass-targeted planning compared to the solitary, adventurous and constraint-free travel experience offered by Instagram pictures: the romantic gaze of the postmodern tourist (Urry). This distinction is possibly due to the different needs, desires, and behaviours of social media users scrolling through plenty of spectacular content using less time and cognitive effort (Debord) compared to the prospective customer who has decided to take a vacation in a specific destination and needs concrete information from a website to organize the trip, including ease of accessibility to services and activities.

Figure 2: A screenshot of a page on the Destination Canada website Keep Exploring, “10 of the Most Scenic Vancouver Views,” showing Jericho Beach.

This is confirmed also by looking at the menu tab of Keep Exploring (see Figure 3), in which buttons like “Where to Go,” “What to Do,” “Plan your Trip,” and “Book your Trip” occur. If one places the cursor on “What to Do,” a list of activities organized by theme and event appear, followed by hyperlinks to purchase offers (see, for example, “10 Canadian Festivals and Events that Heat Up Each Winter”). The same may be said about Tourism Ireland and Tourism Western Australia’s websites, in which practical tips, events and tour prices appear alongside buttons such as “read more” which confer more semiotic weight to the written mode (see “Walking,”; “Perth Festival and Fringe Festival”)

Figure 3: A screenshot of the What to Do menu on the Destination Canada website Keep Exploring.

The variation of multimodal promotional strategies across channels in terms of metafunctions and semiotic systems suggested by this sample analysis were confirmed by quantitative tracking of patterns—thanks to the development and testing of the tree tagging tool that is described in the next section. Indeed, a hierarchical tagging system with three main sets of macro-categories and sub-categories related to the three metafunctions enabled the detection, quantification, and measurement of the occurrence of each visual strategy in each image of the visual sub-corpora under investigation.


3. An Analytical Tool for Visual Analysis and Statistical Measures

3.1 The Analytical Tool

For the annotation, categorization and quantification of visual features across visual corpora, a tree tagging system was developed and tested on a set of tourist images. As mentioned in section 2, this tagging system and its main macro-categories build directly on the three SFL metafunctions and the Grammar of Visual Design. The system includes two additional macro sets of categories which refer to the semiotic resource of colour—along with its harmonies and degree of saturation—and the list of the sociological approaches (i.e., tourist driving forces) from which tourism discourse draws its epistemological and ideological power and which are evoked by the interrelation of the strategies selected for each metafunction. Although Table 1 includes a few references to the macro-variables or strategies of colour or tourism discourse (in the right column visual strategies), these last two macro-categories of strategies will not be introduced in this section. Additionally, in Table 5, macro-strategies or variables such as participant, angle, and shot which pertain to different macro-categories (or metafunctions) and are introduced as such in the tagging system below are listed repeatedly in most metafunctions because they convey more than one layer of meaning. The quantification and understanding of the role and nature of participants, for example, is fundamental at both the ideational and interpersonal level.

The macro-categories that relate directly to the three linguistic metafunctions are listed in Table 6. Each one of them consists of a series of sub-categories (or macro-variables) which define images and from which derive many choices of meaning construction (the main variables). They are (1) representation of reality, (2) relationship with the audience, and (3) composition. Each sub-category includes differing tagging possibilities which, in the first macro-category (1), explore the type of setting and both the presence and type of process enacted by humans. In the second macro-category (2), the type of shot and perspective technique are classified. In the last one, the distribution and weight of the elements in the photograph (3) are analyzed.

The importance of this tool relies on its capacity to provide a measurement of the exact type of visual strategy adopted for the promotional design of tourism images and of its frequency across channels. The patterns of occurrence and the interrelation of these variables, in fact, provide insight into the underlying intentional meaning construction and the manipulative power of images in tourism discourse. For example, the distinction between action and reaction processes—the latter of which include static observation, on the part of the subject, of either animate or inanimate entities2—and the predilection for one of them in discourse constructs a specific representation of the imaginary world and experience promoted. On the one hand, individuals portrayed in activities communicate agency and convey concrete information on a tourist destination; on the other hand, reaction processes with dehumanized static subjects which are photographed from behind communicate a passive pre-consumption of the experience and build particular perceptive and emotive expectations which are evoked mainly by watching people in contemplation (Francesconi; Debord). As mentioned above, high shots depicting the vastness of an aesthetically appealing landscape unveils the intention to assign to viewers a “romantic gaze” position, i.e., power, superiority, and control over an exclusive experience and pretended ownership of the feelings evoked.

Table 6: Tree Tagging System for the Annotation of Tourism Images Building on Halliday’s Three Metafunctions



Sub-Category (Macro-Variables)

Main Variables










Representation of Reality


  • Humans

  • Animals


  • Action

  • Reaction

  • Transactional action/reaction

  • Non-transactional action/reaction


  • Sunrise/sunset

  • Type of weather


  • Natural

  • Artificial

  • Cultural

  • Historical

  • Gastronomic

  • Analytical


  • Sport

  • Recreational action

  • Entertainment

  • Transportation





Relationship with the Audience

Direction of Gaze

  • Towards the represented participant

  • Towards the interactive participant

Camera Shot

  • Close

  • Medium

  • Long

  • Very long

Camera Angle (Perspective)

Subjective Image

  • Vertical angle (low, eye-level, high, very high)

  • Horizontal angle (frontal, oblique)

Objective Image

  • Direct frontal

  • Perpendicular top-down

Depth (Technique)

  • Converging lines (vanishing points)

  • Blurred background/selective focus

  • Element in the foreground

  • Overlap

  • Less detailed/coloured background









Space Distribution

Rule of Thirds

  • One point, more points

  • Scenic rule of thirds (water, land, sky)

  • Lines (horizontal, vertical)

Other techniques

  • Centric

  • Polarization

  • Symmetry

Visual Flow

  • Leading line(s)

  • Connecting dots

  • Framework

Visual Weight

  • Landscape element

  • Represented living participant

  • Object


  • Reflection

  • Sharpness of focus

Source: Adapted from Kress and van Leeuwen, Manovich, Mai et al.


As already mentioned in section 2, images seem to play a different role according to specific channels, and this was confirmed both through annotation and statistical measurement and at a qualitative level. Figures 4–6 below are other examples of website and Instagram pictures which show other major differences that were discussed in the analysis, i.e., the presence of gastronomic and everyday shared, guided activities through spontaneous pictures compared to staged, dehumanized pictures with high angles and long shots in which the main participant is the vast nature that can be tamed and enjoyed from an exclusive position of power and visual consumption.

Figures 4 and 5: Images extracted from Tourism Ireland’s and Tourism Western Australia’s website sub-corpora showing typical tourist (Figure 4) and mass-targeted activities (Figure 5) (Ireland; Western Australia, captured in 2019).

Figure 6: Image extracted from Tourism Western Australia’s Instagram sub-corpus showing an uncontaminated marine environment (“Posts”). An Instagram post shared by Tourism Western Australia (@westernaustralia) on 13 February 2019. Photo by @mattfieldes_photog.


3.2 Statistical Measures

In order to reveal patterns of use of particular visual strategies across channel corpora, the frequency rate of each tag occurrence was calculated. The statistical quantification contributed to providing an answer to the main research questions outlined in section 1:

  • Are there any similarities in the use of visual strategies in the tourist images on Instagram and the official website?

  • Is there a significant variance in the use of the visual strategies across channel corpora?

  • The statistical analysis was carried out by adopting a series of statistical measures using different software technologies:

  • Descriptive statistics: frequency rates and Means (jamovi)

  • Inferential statistics: one-way ANOVA (jamovi); Chi-Square; Principal Component Analysis, Correspondence Analysis; Factor Analysis; Regression

  • Intercoder reliability (Scott’s pi, Cohen’s kappa and Krippendorff’s alpha).

In the first phase, the raw data—i.e., the occurrences of each tag for each image in each sub-corpus—were extracted from the tagging software (section 4) and exported into Excel, where the occurrences of each variable were grouped according to the respective channel. This allowed for the performance of one-way ANOVA in jamovi that calculated the mean of each tag’s occurrences across agency sub-corpora and compared it across channel corpora. This procedure led to the identification of statistically significant variance in the use of specific tags on the different channels. For this study, results with p-values < 0.05 were considered statistically significant, which should ensure the absence of variance by chance and indicate the presence of a pattern of use in a specific social practice (Şahin et al.). Then, these results were compared with the Chi-Square and other types of inferential statistics analyses which were performed in R (Bateman and Hiippala; Field et al.). The script includes correlation plots and contingency tables that confirmed the existence of an association between the categorical variables under investigation, i.e., the website channels and a series of tags. It did not reveal any particular difference in the usage of the visual strategies across tourist boards. In particular, the Chi-Square provided a visualization of the residuals for each tag and their variance across channels. In other words, it calculated whether, where, and how much the expected value (by chance) differed from the observed value for each visual strategy across the channels.

Finally, a series of inter-annotator agreement calculations was performed in order to ensure the objectivity and consistency of the tagging procedure. To this purpose, the feature Scott’s pi was added to the software program developed for the manual annotation. This measure was selected from Bateman et. al.’s inter-coder consistency (Multimodality 198–204) as it is considered a suitable method for measuring the reliability of tagging procedures in multimodal analyses and the obtainability of reproducible results and qualitative interpretations, along with the replicability of empirical methods (Krippendorff 215). This statistical measure basically calculated the agreement rate between tagging choices pertaining to two different annotators who tagged the same set of images with the same tagging system, but are applying the same set of instructions independent of each other.

In this research study, only 18% of the images (30 images per sub-corpus) were tagged by another coder who was previously instructed on the meanings of the tags through a reading scheme. According to Scott’s pi, the reliability value of a coding system should be > 0.7 to eliminate the possibility of agreement by chance. The resulting average value provided by the software was 0.894, which confirmed the reliability of the tagging procedure.

Due to potential inadequacies of Scott’s pi calculations on a hierarchical tagging system with different coding possibilities that depend on the role of each variable in the tagging procedure, more suitable measures are being performed in R under Bateman’s supervision, including Cohen’s kappa (Cohen) and Krippendorff’s alpha (Krippendorff). Indeed, these measures account for (1) the necessary selection of parent nodes compared with actual choices of one of their sub-categories, and (2) the choice between mutually exclusive child nodes compared with selections of non-exclusive child nodes. The reliability values of mutually exclusive choices are provided in Table 7 below. The presence of values which are > 0.61 indicate a substantial agreement between the coders, whereas values > 0.81 indicate almost perfect or perfect agreement, as indicated by clinical research studies (McHugh). As can be seen from Table 7, most variables report a substantial reliability value.

Table 7: Reliability Values for Mutually Exclusive Choices in all Sub-Corpora 


Mutually exclusive child nodes (choices)





Process of Action, Process of Reaction





Transactional Action, Non-transactional Action





Transactional Reaction, Non-transactional Reaction





Sunrise, Sunset, Day, Night





Sunny, Cloudy, Stormy, Foggy, Snowy





Map, Icon/Logo





Sport Activities, Transportation, Recreational Activities, Entertainment, Other Activates





Water Sports, Snow Sports, Hiking, Camping/Trekking, Other Sport Activities





Car Driving, Other Means of Transportation





Painting, Playing Music, Speaking, Walking, Other Recreational Activities





Festivals, Concerts, Movies, Other Entertainment Activities





Towards the Represented Participant(s), Towards the Interactive Participant





Close, Medium, Long, Very Long





Subjective Image, Objective Image





Low, Eye-level, High, Very High





Frontal, Oblique





Direct Frontal Angle, Perpendicular Top-down Angle





Represented Living Participant, Landscape Element, Object

0 916




4. The Technological Tool

The tagging system developed for this study was created using a freely accessible software technology, Statistically Reliable Image Tagging (SRIT), which was designed for complex manual tagging procedures in large visual corpora. It is included in this paper due to its major role in the realization of the current project, and because it represents a novel open access and web based resource for identifying and quantifying occurrences of strategies in visual, static data by means of tailored annotation systems. The software, which was developed by G. E. Pibiri, former computer scientist at the National Research Council in Italy (ISTI-CNR) and currently Assistant Professor at Ca’ Foscari University, builds on previous efficient tools such as UAM ImageTool Version 2.0 (O’Donnel) and, through a user-friendly interface, offers the opportunity to create elaborate tree tagging systems and tag thousands of images. It also enables analysts to perform on-site statistics and export graphs and results to Excel.

The main features are listed below, following the tagging procedure in chronological order:

  • Models: this feature represents the preliminary stage in the manual annotation process which allows for the designing and editing of a hierarchical tagging system.

  • Images: this option enables users to upload and name a .zip file with the images that need to be tagged.

  • Projects: this icon connects the image dataset with the tagging model (a .json file) and creates a corpus.

  • Tags: this feature gives access to the corpus annotation.

  • Statistics: this option reports the absolute and relative percentages pertaining to each project, both in the tagging system and in an exportable chart. Correlation analysis (AND/OR) with embedded image visualization may also be conducted by selecting two or more tags to discover potential co-occurrences of strategies.

  • Reliability: this feature allows for the performance of the inter-coder reliability analysis. For the time being, the only measure included is the inter-annotator agreement measure Scott’s pi. Future implementations will include additional reliability tests and the possibility of distinguishing between mutually exclusive choices and non-mutually exclusive ones.

  • Jaccard and Weighted Hamming Index: these are statistical measures that compare tag choices between corpora of the same user. An embedded visualization tool displays, for each query, the images with the same features in each corpus, to show concretely how similar visual techniques are realized in different images.

Figure 7 shows the interface of the program, while Figure 8 provides an example of how the tagging procedure may be conducted.

Figure 7: The Statistically Reliable Image Tagging program interface with a list of the first three features described in section 4 (Pibiri and Mattei).


Figure 8: An example of the tagging procedure in the Statistically Reliable Image Tagging software (Pibiri and Mattei).


5. Current Advances in the Research Study and Conclusion

The adoption of the tools presented in the previous sections was crucial to the successful implementation of the study. While this paper does not aim to go into detail about the results achieved so far, it seeks to provide an understanding of the methodological framework designed for this multidisciplinary digital humanities project. Indeed, the statistical detection of varying degrees of occurrences of specific features in multimodal meaning design through the notion of metafunction allowed for a solid, qualitative discussion of the patterns of differences in the usage of both semiotic resources and confirmed the existence of specific communicative goals across channels.

In other words, the overall analysis is helping define new discursive and multimodal generic trends that, through visual strategies and intentional word choice, underline specific aspects of the tourist destination in particular ways depending on audiences’ social needs and behaviour on different media, including their degree of intentionality in the purchasing process. Therefore, this investigation is unveiling the intentional design of socio-economic practices that overplay the role of travel as a solution to human needs and, through aesthetically appealing and evocative visual content, boosts emotionally charged reactions for profit-making purposes. Future directions of this research include the publication of detailed reports and discussion of results, together with possibilities of applying the current methodology to socially relevant issues such as the promotion of sustainable behaviour in eco-tourism communication (Fletcher; Stamou and Paraskevopoulos). Systematic, manual annotation of linguistic resources together with automatic detection and machine learning implementations of the current tagging system will also be considered.

In conclusion, the social and academic importance of digital humanities as a broad research field encompassing and integrating different disciplines lies in its capability to provide reliable solutions and objective answers to the investigation of social phenomena as well as a deeper understanding of human society. This is the type of methodological effort many scholars in the “soft sciences” are committed to nowadays, with the objective of contributing to the construction of knowledge with informed social understandings.


Works Cited

Bateman, John A. “Genre in the Age of Multimodality: Some Conceptual Refinements for Practical Analysis.” Evolution in Genres: Emergence, Variation, Multimodality, edited by Allori, Paola Evangelisti, John Bateman, Vijay K. Bhatia, Peter Lang, 2014a, pp. 237–269.

Bateman, John A. Multimodality and Genre: A Foundation for the Systemic Analysis of Multimodal Documents. Palgrave Macmillan, 2008.

Bateman, John A. Text and Image: A Critical Introduction to the Visual/Verbal Divide. Routledge, 2014.

Bateman, John A. “Towards Critical Multimodal Discourse Analysis: A Response to Ledin and Machin.” Critical Discourse Studies, vol. 16, no. 5, 2019, pp. 531–539.

Bateman, John A., and Tuomo Hiippala. "Statistics for Multimodality: Why, When, How–An Invitation." SocArXiv, 2020, pp. 1–19.

Bateman, John A., Francisco O. D. Veloso, Janina Wildfeuer, Felix HiuLaam Cheung, and Nancy Songdan Guo. “An Open Multilevel Classification Scheme for the Visual Layout of Comics and Graphic Novels: Motivation and Design.” Digital Scholarship in the Humanities, vol. 32, no. 3, 2017, pp. 476–510.

Bateman, John A., Janina Wildfeuer, and Tuomo Hiippala. Multimodality: Foundations, Research and Analysis – A Problem-oriented Introduction. De Gruyter Mouton, 2017.

Bhatia, Vijay K. Analyzing Genre Language Use in Professional Settings. Longman, 1993.

Bianchi, Francesca. “The Social Tricks of Advertising: Discourse Strategies of English-Speaking Tour Operators on Facebook.” Iperstoria, vol. 10, 2017, pp. 3–32.

Caple, Helen. “Analysing the Multimodal Text.” Corpus Approaches to Discourse, edited by Charlotte Taylor and Anna Marchi, Routledge, 2018, pp. 85–109.

Caple, Helen, Monika Bednarek, and Laurence Anthony. “Using Kaleidographic to Visualize Multimodal Relations within and across Texts.” Visual Communication, vol. 17, no. 4, 2018, pp. 461–474.

Cohen, Jacob. “A Coefficient of Agreement for Nominal Scales.” Educational and Psychological Measurement, vol. 20, no. 1, 1960, pp. 37–46. 

Dann, Graham. The Language of Tourism: A Sociolinguistic Perspective. CAB International, 1996.

Debord, Guy. The Society of the Spectacle. Black & Red, 1967.

Destination Canada [@explorecanada]. “Posts.” Instagram, 2022,

Destination Canada. Keep Exploring, 2022, Accessed 7 Sept. 2022.

Destination Canada. “10 Canadian Festivals and Events that Heat Up Each Winter.” Keep Exploring. 2022, Accessed 7 Sept. 2022.

Destination Canada. “10 of the Most Scenic Vancouver Views.” Keep Exploring. 2022, Accessed 7 Sept. 2022.

Field, Andy, Jeremy Miles, and Zoë Field. Discovering Statistics using R. Sage, 2012.

Fletcher, Robert. “Ecotourism Discourse: Challenging the Stakeholders Theory.” Journal of Ecotourism, vol. 8, no. 3, 2009, pp. 269–285.

Francesconi, Sabrina. Reading Tourism Texts: A Multimodal Analysis. Channel View Publications, 2014.

Halliday, M. A. K., and Christian M. I. M. Matthiessen. Introduction to Functional Grammar. Fourth edition, Routledge, 2014.

Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. "The Sketch Engine: Ten Years On." Lexicography, vol. 1, no. 1, 2014, pp. 7–36.

Kress, Gunther, and Theo van Leeuwen. Reading Images: The Grammar of Visual Design. Routledge, second edition, 2006.

Krippendorff, Klaus. Content Analysis: An Introduction to Its Methodology. Sage, 2004.

Maci, Stefania M. English Tourism Discourse: Insights into the Professional, Promotional and Digital Language of Tourism. Hoepli, 2020.

Mai, Long, Hoang Le, Yuzhen Niu, and Feng Liu. “Rule of Thirds Detection from Photograph.” 2011 IEEE International Symposium on Multimedia, 2011.

Manca, Elena. Persuasion in Tourism Discourse: Methodologies and Models. Cambridge Scholars Publishing, 2016.

Manovich, Lev. Instagram and Contemporary Image. Manovich, 2016, Accessed 14 July, 2022.

Martin, James R., and Peter R. R. White. The Language of Evaluation. Palgrave Macmillan, 2005.

McEnery, Tony, and Andrew Hardie. Corpus Linguistics: Method, Theory and Practice. Cambridge University Press, 2011.

McHugh, Mary L. “Interrater Reliability: The Kappa Statistic.” Biochemia Medica, vol. 22, no. 3, 2012, pp. 276–282.

O’Donnel, Mick. UAM Image Tool. Version 2.0. 2010.

O'Halloran, Kay L. “Multimodal Discourse Analysis.” The Continuum Companion to Discourse Analysis, edited by Ken Hyland and Brian Paltridge. First edition. Continuum, 2011, 120–137.

Pibiri, Giulio Ermanno, and Elena Mattei. Statistically Reliable Image Tagging. 2020.

Şahin, Murat Doğan, and Eren Can Aybek. "Jamovi: An Easy to Use Statistical Software for the Social Scientists." International Journal of Assessment Tools in Education, vol. 6, no. 4, 2019, pp. 670–692.

Said, Edward W. Orientalism: Western Concepts of the Orient. Pantheon, 1978.

Sayers, Jentery, editor. The Routledge Companion to Media Studies and Digital Humanities. Routledge, 2018.

Simon-Vandenbergen, Anne-Marie, Miriam Taverniers, and Louise J. Ravelli. Grammatical metaphor: Views from Systemic Functional Linguistics. John Benjamins, 2003.

Stamou, Anastasia G., and Stephanos Paraskevopoulos. “Images of Nature by Tourism and Environmentalist Discourses in Visitors Books: A Critical Discourse Analysis of Ecotourism.” Discourse & Society. vol. 15, no. 1, 2004, pp. 105–129.

Stöckl, Hartmut, Helen Caple, and Jana Pflaeging, editors. Shifts towards Image-Centricity in Contemporary Multimodal Practices. Routledge, 2020.

Stöckl, Hartmut, Helen Caple, and Jana Pflaeging. “Shifts towards Image-Centricity in Contemporary Multimodal Practices: An Introduction. In Shifts towards Image-Centricity in Contemporary Multimodal Practices, Routledge, 2020, pp. 1–18.

Taylor, Charlotte, and Anna Marchi, editors. Corpus Approaches to Discourse: A Critical Review. Routledge, 2018.

Thompson, Geoff. “Appraising Glances: Evaluating Martin's Model of APPRAISAL.” Word, vol. 59, no. 1–2, 2008, pp. 169–187.

Tourism Ireland. Ireland. 2022, Accessed 7 Sept. 2022.

Tourism Ireland [@tourismireland], “Posts.” Instagram, 2022,

Tourism Ireland. “Walking.” Ireland, 2022, Accessed 7 Sept. 2022.

Tourism Western Australia. Western Australia, 2022, Accessed 7 Sept. 2022.

Tourism Western Australia. “Perth Festival and Fringe Festival.” Western Australia, 2022 Accessed 7 Sept. 2022.

Tourism Western Australia. “Ningaloo Reef.” Western Australia, 2022, Accessed 7 Sept. 2022.

Tourism Western Australia [@westernaustralia]. “Posts.” Instagram, 2022,

Tourism Western Australia [@westernaustralia]. “We’re totally melting at the sight of this hidden gem!” Instagram, 15 June 2019, Accessed 7 Sept. 2022.

Tourism Western Australia [@westernaustralia]. “#RoadTrip checklist: ✔️ postcard worthy pit stops, ✔️ sweeping ocean views and ✔️picturesque stretches of road to fall in love.” Instagram, 13 February 2019, Accessed 7 Sept. 2022

Urry, John. The Tourist Gaze. Sage, first edition, 1990.

Wildfeuer, Janina, and John A. Bateman, editors. Film Text Analysis: New Perspectives on the Analysis of Filmic Meaning. Taylor & Francis, 2016.



This work was supported by the Department of Foreign Languages at the University of Verona within the MIUR Excellence Programme in Digital Humanities. In particular, the author wishes to thank her supervisor, Prof. Hartle, for her constructive feedback on her research and this paper, and Prof. Bateman for his support and expertise in empirical methods.

No comments here
Why not start the discussion?