The Future of Data Analytics

In God we trust, all others must bring data”.

W. Edwards Deming

Odyssey VC and Compliant Cloud CEO Oisín Curran gives a high-level overview of data analytics and looks to the possibilities ahead

Odyssey VC and Compliant Cloud CEO Oisín Curran gives a high-level overview of data analytics and looks to the possibilities ahead

Engineer and statistician William Edwards Deming paved the way for how analytics plays a key role across the lifespan of a regulated product today, and he gets straight to the critical point. We rely on and collaborate with our data scientists to build robust analytical models that inform and control the supply chain and manufacturing processes. Without data we will flounder, and crucially it must be accurate and reliable data. Let’s remember our first principles – garbage in, garbage out. 

The foundation of an analytics model is a train; validate and test cycle followed by continuous model maintenance and retirement procedures. As such, analytics processes are developed in what are often called analytics “sandboxes”, or development environments. Here they undergo multiple iterations of development and improvement and are robustly tested prior to deployment into a production environment. But what resides in these sandboxes? Who has access to them? And how can we be sure of the integrity of the data underpinning these models that we are becoming more and more reliant on in production environments?

Believe it or not, in the past these sandboxes have existed on the data scientist’s machine. Yes, take a breath! Snapshots of data have been, and at times still are, gathered from varying sources such as the production historian, MES, LIMS and transferred by simple means to a single person. Processes are improving, and we now see analytics sandbox environments pointing to central and shared databases. However, the level of control of such environments is at best questionable. When dealing with the challenges that come with analytics processes, such as time alignment and cleansing of data, a large amount of data manipulation is required. When this is being undertaken on snapshots of data in relatively uncontrolled environments and by any number of data scientists across the enterprise, the opportunity for error is massive.

We need to get better at this. We need to ensure that the data which informs so much of a regulated products lifecycle is of the utmost integrity, whilst of course ensuring it is readily available to the teams and processes that need it most. Having data pertaining to the entire lifecycle in one space, ideally incorporating everything across R&D, Manufacturing, Quality and Supply Chain, means we can further drive efficiencies with reliable analytics. The centralisation of disparate data sources in a compliant and controlled environment opens a massive opportunity for efficient analytics and accurate, targeted decision making. The team are passionate about data, data integrity in particular, and that drives the platform we deliver to our customers.

Imagine a world where data scientists are not just deployed to react but are continuously innovating and deploying analytical models to enhance operations. Imagine they had full end-to-end visibility of a product’s lifecycle in real time, predicting issues and informing preventative action. Imagine they were doing this in a controlled and compliant environment that satisfies regulatory requirements. Imagine no longer. It’s time to act. Then again, in the words of Deming again – “It is not necessary to change. Survival is not mandatory”.

How TIM WOODS applies to paper-based systems

A major goal of the life sciences community is to move away from paper-based systems, and it’s easy to see why. Some of the challenges posed by and waste associated with paper-based systems can be summarised using the acronym Tim Woods; not a real person, but full of real problems.

T – The “T” stands for “Transport”, which involves the physical movement of paper documentation around the office and around the business, starting at the printer where it is initially created. The documentation is passed around to testers, reviewers and approvers, transported from one person to the next; transported between functions and between physical locations.  At the end of all of this, once all approvers have completed the approvals or the paper documentation has served its purpose, it is transported to its final destination, a folder perhaps or a document storage area. But in the life science industry we know that this paper document can be  pulled at any time, for example during audits or to support investigations.   And so the transportation starts again.  Not alone is the transportation a huge waste, but how do you protect the document while it is being transported from file to folder and person to person? How do you assure that it will not be mislaid?  How do you preserve the integrity of the document in terms of completeness, availability and retrievability?

I – The “I” stands for “Inventory”, or in this case the amount of physical documentation  associated with paper processes.  For example, physical retention of  master copies of documentation as well as obsolete or superseded versions.  The stack of physical paper doesn’t take long to become a mountain which poses challenges when it comes to long term storage and retention.  Many companies within the Life Sciences sector end up outsourcing their long term storage solutions to third party which in itself introduces additional complexities around retention, retrievability and traceability.

M – The “M” represents “Motion”, which in the context of paper documentation not only relates to moving documentation around the business but also the movement of people.  For example, if I need to work on the same document as you then you need the document to move to me or I need to move to the document to you or possibly someone else in the organisation.

W – The “W” stands for waiting.  Only one person can work on a physical document at one time.  Even if staff are co-located, there is an amount of waiting required for one person to complete their activities before the next person can perform theirs.  For example, the review of a physical executed test script for a validation exercise can only be performed by one reviewer at a time.  As such the next person in the chain has to wait for the previous person to complete their task. 

O – The first “O” stands for Over-production and the second “O” stands for Over-processing. Paper processes, by their very nature, are often laden with inefficiencies. If you take the example of a physical documentation control process for standard operating procedures, the level of work associated with addressing a typo on a single page of a controlled SOP is often equivalent to the level of work associated with a more significant change.

D – The “D” is for “Defects”, which basically amounts to the waste associated with something going wrong. Take the example of the executed test script for a validation exercise.  If the tester makes errors when recording test details in the paper documentation it necessitates a level of additional documentation, explanation and sometimes investigation and rework which ultimately generates more paper.  

S – “S” is for “Skills”, when paper processes are abundant in an organisation it can often lead to the under-utilisation or poor utilisation of skills when so much labour from highly-skilled people is spent on waiting at printers, scanning documents, stamping documents, filing documents etc.

We’ve all had our own experiences with paper and its compliance and data integrity challenges, but the question remains; what does the future really look like beyond paper? One potential solution is the introduction of validated workflows. The aim of validated workflows is to eliminate GDP errors and have data integrity built in from the start, where you can’t progress to the next step until you satisfy specific workflow requirements. They have been and can be successfully used for the automation and management of validation activities, logbooks, documentation control – essentially anything that has an associated workflow.

This is exactly what we are planning to cover in our exclusive free webinar. On the 24th of October, Patrick Murray, the Compliant Cloud Technical SME for Pharma VIEW™ (Validated Integrated Enterprise Workflow) will discuss the merits of transforming regulated business from paper-based systems to validated workflows, using some common use cases as inspiration. Now is the best time to enter the world of validated workflows; the new paperless. Discover more here:


Data archiving: the obstacles from lab to shelf

how can we embrace this digitisation of data to ensure that vital and essential data is preserved and accessible for as long as it needs to be while protecting its integrity?

Given the nature of its products and its customers, it follows that the life sciences sector is highly-regulated. In fact, the term “pharmaceutical”, per its Greek etymology, “pharmakon”, means both care and poison.  Hence, before being marketed, pharmaceutical drug products must pass an abundance of different tests and be subject to extensive rules and regulations in order to guarantee safety for its customers and patients.

This is not a linear path. The life of the pharmaceutical drug product begins with its discovery, but it doesn’t end with its immediate and quick distribution to those who need it most. From the moment of the initial conception, the path that it follows can be fragmented across different centres, universities and other educational institutes and even across different pharmaceutical companies. This fragmented path results in a vast amount of data production and data sources with complex data property, data custody and data management rights and requirements as well as various data media types. These complexities, coupled with the difficulties associated with identifying and controlling data that requires long-term management and maintenance, represent a significant challenge for pharmaceutical companies today.

FDA (Food and Drug Administration) and European regulations prescribe requirements for data retention and data production, for example the requirement to retain relevant data up to several generations of software and hardware.  Another requirement relates to the retention of pharmaceutical drug product registration-related documentation for as long as a product is on the market plus 10-15 years.  A typical registration submission for a pharmaceutical drug product to a Health Authority consists of a large amount of paper scanned to PDF format, generated from and / or summarising some of the source data.

According to Anita Paul (Roche, Basel, Switzerland); Juerg Hagmann (Novartis, Basel, Switzerland) today the future of pharmaceutical drug product registration is gradually becoming paperless and, very soon, paper submissions will no longer be accepted by major Health Authorities. But is the life science sector moving quickly enough in the same direction? The two scholars discussed the challenges of digital preservation, which does not just mean the ability to read specific data in preserved (rendition) format but also means the ability to “readily retrieve” all pertinent raw data and metadata.

Digitisation of data is arguably the most effective way to preserve the data content and context, and also to facilitate access and retrievability as required.  Building digitisation of data in at every step along the fragmented path of a pharmaceutical drug product results in easy access and retrieval of accurate data by the right people which contributes to sound quality decisions and ultimately safer products for patients.

So, the question is, how can we embrace this digitisation of data to ensure that vital and essential data is preserved and accessible for as long as it needs to be while protecting its integrity?


Anita Paul (Roche, Basel, Switzerland); Juerg Hagmann (Novartis, Basel, Switzerland), Challenges of Long-Term Archiving in the Pharmaceutical Industry, 2008 (last access 09/09/2019)

Ensuring IOT Data Integrity & Security with Identity and Access Management (IAM)

Ensuring IOT Data Integrity & Security with Identity and Access Management (IAM)


Modestas Jakuska focuses on the importance of using an Identity and Access Management (IAM) system in order to maintain data integrity and security in the context of IOT devices.

Ensuring  data integrity means ensuring that data is complete, original, consistent, attributable and accurate. Data must be protected at all stages of its lifecycle, when it is created, transmitted, in use or at rest. Otherwise, there is no assurance that the integrity of current data is maintained.

This is as important for IOT devices (computing devices that connect wirelessly to a network and have the ability to transmit data) as for any other device.  IOT devices are used across a variety of industries, including the life sciences industry where they are often employed in the control of drug product manufacturing or equipment monitoring, e.g. IOT sensor monitoring temperature, humidity, light intensity etc.

There are many considerations for ensuring data integrity for IOT devices including but not limited to:

  • Vendor / Supplier assessment.
  • Verification and definition of the ER (Entity-Relationship) model. 
  • Definition of security protocols used by IOT devices.
  • Definition and verification of  the use of cryptography for IOT communication.
  • Definition of procedures for good data management.
  • Identity and Access Management (IAM)

In this post, however, I will solely focus on the importance of using an Identity and Access Management (IAM) system in order to maintain data integrity and security. In the context of IOT devices, an IAM system is a set of policies and technologies that ensures that only specified IOT devices have access to specified resources with appropriate restrictions.

The importance of IAM has been highlighted by the recent NASA hack which occurred specifically due to the mismanagement of IOT devices. According to NASA Office of Inspector General [1]: “JPL uses its Information Technology Security Database (ITSDB) to track and manage physical assets and applications on its network; however, we found the database inventory incomplete and inaccurate, placing at risk JPL’s ability to monitor, report effectively, and respond to security incidents.”(Note JPL = Jet Propulsion Laboratory).

No device or network is trivial. That includes even the most basic IOT devices. In fact, a Raspberry Pi (a credit-card sized computer that plugs into a computer monitor) was used to gain access to the network. Once accessed,  a network gateway was then used to gain access to other networks. This could all have been avoided if something like network segmentation had been implemented implemented. According to BBC News [2]:  ”Once the attacker had won access, they then moved around the internal network by taking advantage of weak internal security controls that should have made it impossible to jump between different departmental systems … The stolen data came from 23 files, but little detail was given about the type of information that went astray.”

After this ‘hack’ NASA implemented measures to address the identified system weaknesses, including but not limited to semi-annual assessment of inventory to ensure that the system components are registered in the Information Security Database.

In conclusion, the implementation of and adherence to robust IAM policies and technologies is a crucial element in the preservation of data integrity and security for IOT devices.  Failure to do so exposes the data to the risk of corruption, alteration or destruction.


[1] “Cybersecurity Management and Oversight at the Jet Propulsion Laboratory”,, 2019. [Online]. Available: [Accessed: 03-Aug-2019].

[2] “Raspberry Pi used to steal data from Nasa”, BBC News, 2019. [Online]. Available: [Accessed: 03-Aug-2019].

The Crossover of Data Integrity and Data Privacy in the Cloud

The Crossover of Data Integrity and Data Privacy in the Cloud

With the increased adoption of cloud-based applications in the life science sector, Compliant Cloud CSV Engineer Eliane Veiga details the fundamentals of data integrity and data privacy.

Data integrity (DI) and data privacy (DP) challenges have received increased regulatory attention in recent years. When considering GxP applications, a robust approach to risk-based computerized system lifecycle management requires well-defined processes, use of a qualified infrastructure, validated design and deployment of software, qualified personnel, rigorous change management and version control.

With the increased adoption of cloud-based applications in the life science sector, cloud computing solutions such as Software as a Service (SaaS) offer many advantages including enhanced cost-effectiveness, ease of implementation, and flexible, highly scalable platforms. However, assuring data integrity and data privacy in the cloud requires a well-informed, proactive approach by the regulated organization in planning and maintaining control of their data once it is hosted on the cloud provider’s site.

In Europe, protection of data privacy is now regulated under the General Data Protection Regulation(GDPR), which came into force on the 25th May 2018 replacing the existing data protection framework under the EU Data Protection Directive.

Data Integrity – The Fundamentals

The UK Medicines & Healthcare products Regulatory Agency (MHRA) defines data integrity as “the degree to which data are complete, consistent, accurate, trustworthy, reliable and that these characteristics of the data are maintained throughout the data lifecycle” (MHRA, 2018).

Assuring data integrity requires effective quality and risk management systems which enable consistent adherence to sound scientific principles and good documentation practices. The international regulators have defined an acronym (ALCOA) as the five elements necessary to assure data integrity throughout the data life-cycle. Even though ALCOA has been widely discussed in many publications, evidence from the US FDA warning letters and EU Statements of Non-Compliance (SNCs) indicate that there still are many who do not understand the fundamentals of ALCOA.

More recent publications, including the WHO Guidance on Good Data and Record Management Practices, have expanded these principles to describe ALCOA+ expectations, which puts additional emphasis on ensuring that data and records are “complete, consistent, enduring and available” (WHO, 2016).

Data Privacy – The Fundamentals

The General Data Protection Regulation (GDPR) came into force in the EU on the 25th May 2018, replacing the existing data protection framework under the EU Data Protection Directive. The GDPR emphasizes transparency, security and accountability by both data controllers and data processors, while at the same time standardizing and strengthening the right of European citizens to data privacy.

From a health care and cloud-based solutions prospective, the GDPR brings some significant changes from the current directive including:

  • definition of “sensitive personal data”
    •  imposes stricter obligation on both data controllers & processors
    •  appointment of a Data Protection Officer (DPO)
    •  conducting Data Protection Impact Assessments (DPIA)
    •  assuring security of data processing

As data controllers and processors have been allocated shared, stricter responsibilities under the GDPR, the obligations on both controllers and processors have been a surprise for the IT Sector.

Under GDPR, the data controller must implement organizational and technical measures to demonstrate compliance of the processing activities undertaken on their behalf. Furthermore, data controllers have the responsibility for selection and oversight of their service providers (data processors).  The GDPR defines such a data processor as “a natural or legal person, public authority, agency or another body which processes personal data on behalf of the controller”. 

The compliance burden is now shared between processors and controllers. One of the significant requirements that GDPR imposes for processors is that if they intend to hire another processor to assist with data management, e.g. a cloud computing supplier, the data controller must approve this appointment prior to commencement. This requirement is intended to protect personal data from transfer to a third party, even to another country, without the controller’s prior authorization.


As the adoption of digital technology – such as cloud-based – has increased in the life science sector, under the GDPR it will no longer be possible for cloud provider services (processors) to position themselves as mere processors and evade the reach of data protection rules. Recent publications have shown that to achieve assurance of DI in the cloud, service providers must still learn how to comply with the GxP regulatory bodies.

From Compliant to Complaint: the human error minefield in the life sciences industry

In particular for the life sciences industry a human error, undetected or unresolved, poses significant risk to the end user of the product.


Nicola Brady tells you how to mitigate risks in the life sciences industry.

To err is human, to forgive divine.  But this forgiveness is not usually forthcoming in industries where a human error can translate into a significant business impact.  Human error imposes significant costs on a business, costs to the quality of the product or service being delivered, financial costs and often reputational costs.

In particular for the life sciences industry a human error, undetected or unresolved, poses significant risk to the end user of the product.  This is why life sciences companies invest so heavily in programs and policies to drive human error down to as low a level as possible. To eliminate it completely is impossible!  Although the world in which we live is moving rapidly towards automation and Artificial Intelligence (AI) technologies, people are still necessary and unfortunately fallible; where there are people there will always be the potential for error.

So, what can a company do to reduce the occurrence
of human error or reduce the impact when it occurs?

  • Allow time for training. Initial training and on-the-job training should be in place and appropriate time should be allocated to allow for training.
  • Put robust processes in place. Having comprehensive policies and procedures in place will ensure standard consistent processes are followed and make errors and deviations more detectable. ‘Error proof’ the process as far as practicable. Complex processes should be risk assessed and mitigation actions implemented as required.
  • Ensure the workplace environment is appropriate for the work required. Consider noise levels, lighting, temperature or other environmental factors that might cause distraction.
  • Document it, investigate it, learn from it. Effective investigation processes should be in place to determine root causes and implement corrective actions. The investigation should not stop when a root cause of ‘human error’ is determined; dig deeper and you might find that there was something else at play.  This will allow you to address the underlying causes that contributed to the human error in the first place and reduce the likelihood of its recurrence.
  • Adopt the right culture. There’s no use for ‘blame culture’.  A quality culture where employees are encouraged to ‘raise their hands’ when mistakes occur actually serves to drive the rate of mistakes down.

If a company applies the right focus and attention on
training, processes, workplace environment, investigations and overall culture
they should find it easy to remain compliant and avoid that complaint!

Data Integrity Challenges and the Cloud

Data Integrity is not a new concept. It has been around since paper and ink were the only ways of doing business. The requirements for electronic data are equivalent to those for paper data. The FDA Glossary of Computer Systems Software Development Terminology defines data integrity as “The degree to which a collection of data is complete, consistent and accurate.”

To assure data integrity good documentation practices and the ALCOA+ principles apply:

    • The data is attributable to the person(s) and /or system(s) that generated it, and include who did what, why, and when.
  • (L) LEGIBLE –
    • For electronic data is permanently recorded and always available for review and retrieval.
    • The data is electronically recorded and stored at the time it is generated, with time/date stamps so that the sequence of events can be easily followed.
    • The original source data as well as copied records is preserved. Copies, including backup/archive copies, must be verified as accurate and true, preserving the content and meaning of the original, with the data traceable to its origins.
    • Whether results are recorded electronically, it is essential that they are generated by validated systems.
  • (+) The data must be CONSISTENT, all records must be COMPLETE, including any metadata (contextual information required to understand the data) and data must be ENDURING & AVAILABLE. 
Data IntegrityWhether an organisation’s electronic data is stored on internal servers or in the cloud, the ALCOA+ data integrity principles apply.The challenges associated with preserving data integrity are many, but specifically in the world of cloud computing, data integrity is one of the biggest challenges to overcome.

When an organisation outsources their data and applications to the cloud they are handing over control. Will the data be safe and secure, protected from loss or damage and protected from unauthorised access or manipulation?

It is important to remember that the overall security of any Cloud based system is only equal to its weakest component. Can the Cloud Service Provider assure that there are controls to prevent data loss or manipulation?  What will happen if there is a data breach/data hacking incident? Where will the data / application be stored? How will the security of interfaces be assured?  Will the access to the data be by authorised personnel only with full audit trail availability? Will there be multitenancy in the cloud? Based on this, the vetting of potential Cloud Service Providers needs to be diligent and robust agreements should be implemented to ensure appropriate controls, checks and balances (e.g Validation, Backup, Disaster Recovery, Access Controls, Audit Trails, etc) are in place to assure data confidentiality, data integrity and data availability. 

A cooperative relationship between the cloud service provider and the organisation is key to assuring that data integrity is preserved for any data and / or applications stored in the cloud. Once this relationship is established, maintained and monitored,  and appropriate checks are balances are put in place the data integrity challenges associated with ‘moving to the cloud’ should not be so great after all. In fact compliance delivered at the cloud level can be hugely instrumental in future proofing the business models of highly regulated companies – those in Life Sciences, Connected Health and many other sectors.