Structured Document Authoring is Obsolete – Now Data Exchange is King

Structured Document Authoring is Obsolete – Now Data Exchange is KingImage | AdobeStock

As the world transitions from documents as the medium of business information exchange to data driven information-based processes, structured document authoring will become obsolete in the life science sector. Generis’ James Kelleher outlines the opportunities that a data-first approach will deliver.

Structured authoring was designed to be the future of document authoring – the ultimate efficiency in presenting information. But a data-driven approach to information sharing is set to address issues with version control and provide a more traceable line back to the master source of intelligence. This does mean that structured document authoring, as a much-anticipated technology proposition, is already obsolete – before it really had a chance to get off the ground.

The original concept of structured document authoring, which dates back to the 1990s, is based on building routine documents from re-usable segments of content. However, this soon comes up against practical limitations. If the approved, reusable content assets are entire paragraphs or sentences, typically these will need to be tweaked for each use case, for instance. With each edit a new version of that content is created, with implications for change management.

In the meantime, the focus of Regulators and of recommended business practice more generally, has shifted towards live data as the primary source of truth, and as a means of transforming processes. This move away from document-based submissions and reporting further erodes the business case for structured document authoring.

Although regulated documents as official records of a product won’t disappear overnight, their ’Best Before’ date is drawing ever nearer. During the latter stages of the transition to ISO IDMP compliance in the EU, for instance, published documents will be phased out in favour of rolling data-based submissions: data that regulators can choose to analyse in their own way.

Ultimately, data-based information exchange will become the preferred norm for regulatory submissions, PSMF (safety master) files and APQR (annual product quality review) reports. In fact, PV case file submissions in Europe are already submitted in data form.

Content management investment

Strategically, the focus of new content management investments must now be the data itself, and how this is managed so that it can be used more dynamically for publication – without the risk of a future loss of integrity or consistency between the data and any associated narrative.

Next-level structured content authoring places the emphasis on ‘data objects’. That data object might be ‘Study 123 is a 3-day short-dose study in male rabbits’, for instance. Creating a narrative now means pulling in these ‘data objects’ and inserting minimal ‘joining text’ to make the data more readable in a particular context.

Here, if core information changes, updates can be made at a source level and automatically cascaded down through all use cases for that data object, without the need for extensive manual intervention.

This approach to content preparation offers much more dynamism and flexibility than a structured document authoring scenario. With the persisting diversity in requirements between the different Regulatory authorities, this controlled flexibility is very useful.

A collaborative mindset

Moving away from documents and even from reusable content requires a different mindset, and this is probably one of the biggest barriers for companies currently.

Relying less on Word might seem to imply that teams will need to become proficient in XML. Yet this perception is tied up with the traditional treatment of content – in contrast to the new scenario where the focus is the master data and adding to this to enrich associated company-wide knowledge (around a given product and its evolving status), and where editing can be done in the new breed of user-friendly tools whether for data, Word or XML.

This is about teams from multiple functions all contributing to and enhancing one unified data source, rather than each continuing to enter their own particular information of interest into their respective systems (Clinical, Regulatory/RIM, etc).

Streamlining content sharing and data exchange

At a conservative estimate, working with data objects, there is scope to reduce the effort of producing a final draft for approval by a factor of 10, thanks to the reduced specialist resources and manual steps needed in authoring and version control. That’s in addition to huge savings in the time and effort that would otherwise be needed to manage components – including decisions about the levels of granularity, rules around re-use, traceability of data into the output, and an entire migration of larger documents into smaller documents/components.

Data objects are already streamlining information exchange processes in major industries. The airline and automotive industries, for instance, where precision, rigour and safety are as critical as they are in life sciences, already use trusted data objects to construct content.

It is entirely possible for companies to skip a generation of automated content authoring and go straight to a data-first approach to process management. The first step is to gather together data that is good quality, complete and current, and usable for the intended purposes. Companies can then start to add automation rules to use data in the right place at the right time, to streamline business and regulatory processes.

This may feel like a big adjustment. But in fact, companies now have a chance to leap-frog straight to a solution that is much more fit for purpose, transformational, pliable and sustainable in the long term.

About the author

James Kelleher is the founder and CEO of Generis Corporation, the creator of CARA™, a data and content management platform that helps companies in regulated industries, like Life Sciences, transform their complex business processes.