Operating in Permanent Beta: How Can Organisations Cope?

Kim van Oorschot, Luk Van Wassenhove

Today’s digital services are delivered in “permanent beta”, with continuous fine-tuning and updating, such as app updates. How does this affect the way companies measure service performance?

Every day, companies in industries such as telecommunications, banking and insurance serve millions of consumers through digital services, mostly in the form of automated processes. Behind these services is a network of digital service companies – from the modem provider to the broadcast service, middleware and content provider – which interact with one another.

Over the past decade, the IT-enabled digital service supply chain has transformed into a digital service network of independent companies continuously delivering new innovations. In this complex network structure, each company may contain software bugs that can disrupt the primary service, either independently or through interactions with other services or programs in the network.

The fact that failures can arise anywhere in the network, independently or in combination with other services, means that it can be difficult to detect the causes of failure before a new service is deployed. As a result, while new products previously underwent a phase of beta testing by end-users to uncover issues prior to release, today’s digital services are delivered in “permanent beta”, with continuous fine-tuning and updating, such as app updates.

For companies operating in this permanent beta reality, the concept is perhaps nothing new. However, the implications for the company’s strategy, performance measures and resource allocation are less understood. In a study, we examined the repercussions for the company's quality assurance (QA), operations and training teams. 

From complicated to complex products

The digital services environment has evolved from a complicated to complex one. The conventional product development process, although complicated, is structured and typically sequential, where the relationship between action, design and outcome is clear. In this context, QA experts would ensure the software’s performance by identifying and eliminating most bugs before the service is put in the hands of the consumer.

On the other hand, digital product development today involves an increasing number of stakeholders and subsystems in a digital service network. The complexity makes it either impossible or cost-prohibitive to identify all potential faults and the origin of those faults before launching a service. When it is near impossible to eliminate all bugs before the service reaches the consumer, performance measurement through QA is rendered ineffective.

In addition, the digital service landscape is not only complex, but also highly competitive: Consumers expect both high levels of innovation and reliability. Businesses are therefore not only under pressure to release services quickly, but also to adapt and renew their services in line with rapid technological development.

However, innovativeness and reliability are sometimes in conflict with each other. Continuous updates to introduce new features can introduce new software bugs that reduce reliability at the same time. When service incidents do occur, companies are expected to recover services swiftly, or risk losing consumer confidence and loyalty. This adds more urgency and complexity for service providers.

Overall, complexity is determined by the number of digital service providers in the network, the complexity of the product and the speed of change (which is particularly crucial for sectors such as electronics, software and vaccine production during a health crisis).

Business not as usual

Despite the digital services environment becoming increasingly complex, not all organisations recognise the need to realign their strategy and performance measures. My co-authors and I seek to uncover the implications for resource allocation – specifically, the difference between performance measurement before and after the release of digital services – and the role of employees.

In the longitudinal study of a European telecom company TeleSP, we collected empirical data on its digital TV service over nine years. To provide a more holistic view, we built a model with stock and flow diagrams to identify general conditions or situations when things could go wrong , in particular when environments evolve from complex to chaotic. We included parameters such as the flexibility of assigning human resources between QA, operations and the training teams.

Findings reveal that due to the increased complexity, traditional performance measurement methods focused on detection of software bugs before release become less effective. Since companies can no longer assume that they can predict what will go wrong and that they can solve issues before the service is launched, it might be more effective to release the service, monitor its performance and solve problems promptly in the field as they arise. A common example is updating mobile apps whenever software patches are introduced to fix issues.

This strategy goes against recent and current industry practice. As traditional methods like QA become less important, organisational change is needed. Our research shows a shifting dominance of organisational processes in response to increasing levels of complexity. Organising for permanent beta also implies the need to rethink decisions about resources.

Reorganise and reskill

Ensuring that an organisation is ready for permanent beta involves organisational as well as people-related aspects. First, leaders need to recognise that the business landscape is evolving from complicated to increasingly complex or even chaotic.

More concretely, steps need to be taken on two fronts: the organisation and the people. Organisational agility is needed to allow employees to be deployed across front-end roles (post-sales support) and back-end roles (QA and innovation) to better meet manpower needs in a timely manner. With increasing complexity, more service issues are expected to arise in the field, which requires more performance monitoring after releasing the service.

Digital service providers must have the flexibility to reduce pre-release QA activities and reallocate QA employees to post-release performance monitoring and service recovery in the field as needed. This means that employees need to be trained to be effective in monitoring and anticipating problems, as well as remedying the issues promptly in the field, in addition to QA tasks.

It is important to note that while innovation is widely celebrated, companies need to consider the impact on the wider organisation and employees. Innovation cannot happen at a rate where inadequate resources are allocated to manage issues that arise in the field or employees suffer from burnout. Companies need to anticipate potential issues that may arise, increase manpower assigned to post-sales technical support accordingly, as well as reskill employees in line with innovation.

The above aspects need to be managed effectively, otherwise, inadequate capacity to address issues arising in the field can lead to bottlenecks in post-sales support and employee burnout due to insufficient manpower to meet the demand for support.

Taking the pulse of performance

To ensure post-sales service reliability, digital service providers are moving towards condition monitoring by deploying data-driven automated systems to continuously monitor the performance of the service. This ensures that crucial indicators of performance are kept within a healthy range. Should one of the critical parameters be out of acceptable range, it signals the need to address a potential issue. This is akin to wearing a fitness tracker on your wrist to continuously monitor certain health indicators and provide early warning signs before an ailment arises.

Compared to routine maintenance such as annual vehicular inspection (which may not always be needed) or undertaking corrective maintenance only after service failure occurs, this approach is not only more cost-effective, but also potentially less disruptive to the user. Other industries where performance and uptime are crucial – such as chemical and energy industries, infrastructure, aerospace and shipbuilding – have been moving toward maintenance based on continuous monitoring of the equipment’s condition, enabled by the greater availability of data through IoT.

Each company’s management will need to decide what parameters to measure, how to measure them and what the acceptable range is. Performance monitoring is most effective in predicting the probability of failure when automated data collection and analysis are combined with human expertise for data interpretation. Investing in human expertise is equally important as investing in automated processes.

Overall, there are many moving parts in the increasingly complex digital services environment. Ultimately, businesses need to align performance measurement with elements such as business strategy, organisational culture and the external environment.

The text was originally published by INSEAD Knowledge: https://knowledge.insead.edu/operations/operating-permanent-beta-how-can-organisations-cope

Published 2. May 2023

You can also see all news here.