Factory growth and IT/OT architecture: why do stopgap integrations come back to bite you after a few years?

Technical Summary

Key takeaways:

The text explains that the lack of a deliberately designed IT/OT architecture turns quick workarounds into technical and organizational debt. The result is downtime, disputes over responsibility, and higher risk during modernization and conformity assessment.

IT/OT architecture has become a design decision that affects costs, organization, and process availability.
Stopgap integrations help during commissioning, but later increase the cost of changes, audits, incidents, and expansion.
Three criteria are key: the time available for a safe change, the owner of each data exchange, and an analysis of the impact of failures on production.
When integration affects stopping, energy isolation, or restart interlocks, it falls within the scope of functional safety.
Temporary solutions should have an owner, withdrawal conditions, documentation requirements, and reassessment criteria.

Why this matters today

Factory expansion is becoming less and less about adding one more machine or commissioning another line in isolation. It usually means enlarging an environment in which production systems, maintenance, quality, planning, warehousing, and management reporting must exchange data and all influence process availability. In that setup, IT/OT architecture stops being a technical issue to “sort out later” and becomes a design decision with financial and organisational consequences. Temporary integrations work during commissioning because they solve an immediate problem: they quickly connect a new machine, export a few signals to a report, or work around the limitations of an older controller. The payback comes a few years later, when the plant tries to increase output, meet new compliance requirements, or safely change how the installation operates. At that point, it becomes clear that the problem is not a single cable or script, but the lack of consistent rules for communication, responsibility, and functional separation.

The biggest mistake is to treat such solutions as cost-neutral. They only defer the cost, and usually until the worst possible moment: during expansion, an audit, an incident, or a supplier change. From a project perspective, the result is not just a more expensive implementation of the next phase, but also a loss of predictability. The team no longer knows which dependencies are critical to production continuity, where the integrator’s responsibility ends and the plant owner’s begins, or which changes require a renewed risk assessment in the project. In practice, this is exactly where the hidden costs of poor design decisions begin: additional downtime, ad hoc service work, repeated acceptance tests, difficulty documenting changes, and disputes over warranty scope. If the architecture has not been defined as a deliberate model for factory development, every subsequent phase will carry technical and organisational debt.

A good practical test is not whether the integration “works”, but whether it can be changed safely and predictably two or three investment stages later. If a new line requires manual signal mapping in several places, knowledge of the connections is scattered across suppliers, and reconstructing the full data path requires analysing controller code, intermediate databases, and undocumented services, then the project has already entered a higher-risk path. It is worth assessing the situation against three measurable criteria: the time needed to implement a controlled change, the ability to clearly identify the owner of each data exchange, and the ability to trace the impact of a failure or modification on production and safety. If these three points cannot be clearly established, the issue is not team convenience, but control over the entire undertaking.

A recurring example from practice is this: a plant launches a new production area and, to enable a fast start, connects process data to business systems through intermediate solutions created outside the target architecture. For a while, everything appears to work because the data flow is sufficient for reporting and day-to-day supervision. The problem arises during further automation of production processes, integration with maintenance, or a change in machine operating logic. Then a single modification in the operational layer affects reports, alarms, recipes, or remote access, and the dependencies are no longer obvious. If the solution also interferes with functions related to stopping, energy isolation, or preventing restart, the issue is no longer purely an IT matter. It moves into the area of functional safety and requires a separate analysis, including verification of whether the assumptions for protection against unexpected start-up have been compromised. This is where IT/OT architecture directly intersects with risk analysis in a factory development project and with decisions that later also affect the scope of conformity assessment and technical documentation.

That is why this issue requires a decision now, not after commissioning is complete. Not because every integration must be fully developed from the outset, but because from the start it is necessary to distinguish between a temporary solution and one that is intended to become part of the plant’s permanent architecture. That distinction should have project consequences: a separate decision owner, conditions for withdrawing the workaround, documentation requirements, and criteria for reassessment during expansion. If the plant is planning further investment stages, machine upgrades, or preparation for conformity assessment, the lack of such a distinction almost always increases the cost of change and expands the investor’s scope of responsibility. That is exactly why IT/OT architecture is no longer an add-on to the project, but one of the conditions for keeping cost, schedule, and risk under control.

Where cost or risk most often increases

The most expensive part of factory development is usually not the IT/OT interfaces themselves, but the consequences of decisions made “for the time being” that, a few years later, end up serving as permanent architecture. A temporary integration becomes costly not because it was technically flawed, but because no one defined its boundaries: who is responsible for changes, which data is the source of truth, how the configuration is to be restored after a failure, and when the workaround is to be removed. In practice, costs rise when a temporary solution finds its way into maintenance, production, quality, or management reporting without a formal decision that it has become a critical element. For the project, this means later disputes over budget and scope; for the organisation, it also means blurred accountability: the failure looks like a technical issue, even though its root cause was an unresolved architectural decision. A useful assessment criterion here is a simple question: after the plant is expanded, can you identify the process owner, the data owner, and a safe change procedure without involving “the only person who knows how it works”? If not, the risk is already built into the project.

A second source of escalating costs is the lack of separation between the control layer and the business data exchange layer. In the first phase of an investment, this shortcut can be tempting: the same server handles communication with the machine, archives data, feeds the report, and provides remote service access. On a single line, this may appear to work well enough, but in later expansion stages every change made for one purpose affects the others. An update required by a corporate system may disrupt production continuity, and the need for faster reporting may lead to interference with the configuration of devices that had previously been operating stably. At that point, the hidden costs of poor design decisions are not limited to additional hardware purchases or integrator services. Far more painful are the costs of downtime, repeated testing, night work during implementation, and the need to reconstruct knowledge that was never documented anywhere. From a project management perspective, the sensible minimum is to assess whether a failure or change in the IT part can stop an operational function of the machine or line. If the answer is yes, the architecture needs to be corrected, regardless of the fact that “it works for now”.

A typical example appears when new machines are connected to an existing plant infrastructure. The supplier commissions the equipment quickly because acceptance and production start-up are needed, so communication with plant systems is handled through an additional computer, a script exporting files, or a manually modified signal map. After a year, another machine is added; after two years, the supervisory system changes; after three years, it turns out that no one can clearly describe which messages are critical to the process, which are used only for reporting, and which matter for diagnostics or batch traceability. At this point, the issue partly moves into the area of creating machine operating manuals, because if the operator, maintenance team, or service personnel do not have documented procedures for communication loss, manual override, or restoring parameters after a component replacement, the problem is no longer purely an IT matter. It becomes part of the organisation of safe operation and of later responsibility for how the machine is used and modified.

Only at this stage does it become clear why the issue also returns in conformity assessment, technical documentation, and change budgeting. If the integration affects machine functions, interlock logic, the way statuses are acknowledged, or the information provided to the user, a new risk analysis may be required, along with verification that the documentation still reflects the actual solution. The scope of that assessment depends on the nature of the change, so it cannot honestly be settled with a single universal statement, but that is precisely why temporary fixes are so costly: they make it harder to determine what was actually changed and what the legal and operational consequences are. For the decision-making team, the practical criterion is this: if a change in the integration cannot be described in the configuration documentation, the test procedure, and the operating rules without relying on informal knowledge, then the project has already entered a zone where not only technical cost is increasing, but also the responsibility of the investor, the project manager, and the people approving the solution for operation.

How to approach the issue in practice

In practice, the question is not whether to integrate IT and OT faster, but where to draw the line between a temporary solution and architectural debt that will block factory development in a few years. Temporary connections usually arise under commissioning pressure: data must be pulled from a machine quickly, a new line must be added, the quality system must be linked to production records, or remote service access must be provided. The problem begins when a solution implemented “for the time being” becomes the basis for subsequent design decisions. The team loses a clear division of responsibilities, and every expansion requires knowledge to be reconstructed from correspondence, local settings, and operator practice. This is no longer a minor technical inconvenience, but a factor that affects the schedule, the cost of change, and the ability to demonstrate who approved a given solution for operation and on what basis.

That is why the right approach starts with an architectural decision, not with choosing a tool. The manager or area owner should require every new integration to have a defined operational purpose, an owner on both sides of the IT/OT boundary, and agreed support conditions after go-live. If it is unclear who is responsible for the data source, who approves configuration changes, who tests the impact on the process, and who decides on fallback mode, then the project is effectively pushing risk into the operational phase. This is where the project manager’s role in IT/OT decisions naturally begins: not as a schedule coordinator, but as the person who forces responsibility to be settled before a temporary fix is written into the budget and timeline as a “quick workaround”. A practical assessment criterion is simple: if the planned integration cannot be maintained after a supplier change, controller replacement, or line expansion without the involvement of the person who created the original workaround, then it is not a temporary solution but a future project cost.

A good test case is the expansion of an existing line with an additional station that is expected to send data to a higher-level system while also responding to statuses from the part already in operation. If the team decides to connect signals directly and use informal data translation “because it will be faster”, everything may initially work correctly. Over time, however, side effects appear: it becomes harder to determine whether an error comes from the machine logic, the communication layer, or the reporting application; acceptance testing covers only standard scenarios; upgrading one element forces changes in several places at once. This is also when the hidden costs of poor project decisions become visible: extra downtime for diagnostics, costly integrator involvement for every change, disputes over warranty scope, and delays in later stages of the investment. It is therefore worth measuring not only commissioning time, but also the number of integration points requiring manual configuration, the time needed to analyse an incident after a change, and the number of changes that must be tested end-to-end rather than locally.

Only in that context does it make sense to refer to safety and compliance requirements. If the integration affects machine operating states, interlocks, signal acknowledgements, or the startup or shutdown sequence, it is no longer a neutral IT add-on. Depending on the nature of the change, this may trigger the need for a new project risk analysis, an update to the technical documentation, and verification that the operating method still matches the assumptions adopted for the machine or line. This is particularly clear where the integration layer begins to indirectly affect conditions for safe access, energy isolation, or prevention of unexpected startup. In such a case, the architectural decision moves from implementation convenience into the area of legal and technical responsibility. If the team cannot show which connections are purely informational and which affect machine behaviour, that is a sign the issue should be taken out of the “systems integration” category and treated as a change that matters for safety, budget, and the responsibility of those approving the solution.

What to watch out for during implementation

Most problems do not stem from IT/OT integration itself, but from treating it in the project as a quick way to launch a new function rather than as a permanent part of the factory architecture. That is exactly when temporary connections come back to bite a few years later: during line expansion, controller replacement, a change of higher-level system supplier, or a safety audit, it turns out that no one can clearly identify the interface owner, the rules governing its operation, or the consequences of failure. For the project, this means not only the cost of technical debt, but also an organisational cost: more coordination, longer end-to-end testing, more difficult acceptance, and a greater risk that delays will appear only at the very end, when the schedule is least flexible. At this point, the issue naturally moves into the area of hidden costs of poor project decisions, because the source of the problem is not a single execution error, but the decision to postpone proper architecture until later.

During implementation, it is therefore worth assessing the solution not by whether it “works now”, but by whether it can be maintained and changed safely in a predictable way. The practical criterion is simple: if the planned integration does not have a defined scope of responsibility, failure mode, versioning rules, and a post-change test procedure, then it is not yet ready for production deployment, even if it works at a test station. This is especially important where the same interface is expected to support both the current stage of the investment and future expansion. Factory development almost always increases the number of dependencies between systems, and temporary fixes perform worst precisely when the number of exceptions, workarounds, and local arrangements grows. From the project manager’s perspective, this means the need to decide early who approves boundary decisions between automation, maintenance, IT, and compliance, because without that, responsibility becomes blurred exactly where the biggest disputes over cost and schedule later arise.

A typical real-world example is adding data exchange between a line and a reporting system through an intermediate script or an undocumented service running on a server that “is already on site”. At commissioning, the solution seems reasonable: it does not require changes on the machine supplier’s side, shortens implementation time, and makes it possible to show a business result quickly. The problem comes later. After an operating system update, a change in addressing, a backup restore, or device replacement, no one can be sure that the signal mapping logic still reflects the actual process. If that mechanism is involved in acknowledgements, interlocks, job queuing, or start conditions, the failure stops being an IT incident and starts affecting line availability, production quality, and responsibility for approving the solution for operation. At that point, the issue naturally moves into risk analysis in a project, because you need to assess not only the probability of failure, but also the consequences of incorrect information, an incorrect sequence, and an incorrect operator response.

Only in that context does it make sense to refer to formal requirements. If the integration layer remains purely informational and this can be demonstrated technically, the scope of obligations will be different than in a situation where it affects the behaviour of the machine or line. However, if it influences operating logic, start conditions, stopping, acknowledgements, or bypasses, the implementation must be treated as a change with technical significance and potentially safety significance, not as a routine system expansion. This may mean the need to re-check the assumptions of the risk assessment, the technical documentation, and the conformity assessment conditions adopted for the solution. In practice, the safe question is not “can this be connected”, but “after implementation, will we be able to prove what this interface does, who is responsible for it, and how we control the change”. If the answer is not clear-cut, the cost of a postponed architectural decision will usually return during the next upgrade, certification, or incident, and by then it will no longer be only a technical problem, but also a management one.

FAQ: Factory growth and IT/OT architecture – why do makeshift integrations come back to bite you after a few years?

During commissioning, they solve the immediate problem, but over time they become part of the permanent architecture without clear change-control rules or defined ownership. This increases the cost of expansion, audits, service, and fault rectification.

Warning signs include manually mapping data in multiple places, fragmented knowledge of the connections, and a lack of complete documentation of the data path. The risk also increases when it is not possible to quickly identify the owner of the data exchange and the impact of a change on production.

The text identifies three practical criteria: the time required to implement a controlled change, the ability to clearly identify the owner of each data exchange, and the ability to reconstruct the impact of a failure or modification on production and safety. If these elements cannot be pinned down, the project becomes unmanageable.

When a solution affects functions related to stopping, energy isolation, or prevention of restart, it falls within the scope of functional safety and requires a separate risk assessment.

At the outset, it must be determined whether a given solution is a workaround or part of the plant’s permanent architecture. That distinction should have design implications: who owns the decision, the withdrawal conditions, documentation requirements, and the rules for reassessment during expansion.

Share: LinkedIn Facebook

Key takeaways:

Why this matters today

Where cost or risk most often increases

How to approach the issue in practice

What to watch out for during implementation

FAQ: Factory growth and IT/OT architecture – why do makeshift integrations come back to bite you after a few years?

Why do makeshift integrations become a problem after a few years?

How can you tell that integration has entered a high-risk path?

What criteria are worth measuring when assessing IT/OT architecture?

When does integration stop being just an IT issue?

What should distinguish a temporary solution from a permanent one?

Related articles (For Manager)

Engineering Shield Technical Bulletin