Key takeaways:
- This article covers key safety considerations.
The question of whether a REST API is suitable for industry is no longer a debate about a preferred integration style. It is a decision about where the project will absorb cost, delay, and operational responsibility. In an industrial environment, a communication interface very quickly stops being just a “technical layer” and starts affecting process continuity, repeatability, auditability, and incident response. REST works well where you need a simple call, a clear response, and transparent control over the request state. The problem starts when the system must keep operating despite the temporary unavailability of one participant, when messages must be delivered with confirmation, or when a single event must trigger effects in several independent areas. At that point, the choice between a synchronous request and a queue, broker, or event-driven communication model is no longer technically neutral.
This matters now because more and more industrial projects are linking control, maintenance, quality systems, production reporting, and external services into a single chain of dependencies. If the architecture relies only on synchronous calls, the team often ends up with a system that looks simple but becomes fragile as the number of integrations grows, the network becomes unstable, or a strict event trail is required. The cost of that decision does not show up during a feature demo. It appears later: when processes are blocked by an unavailable component, when incident reconstruction is difficult, when system states have to be reconciled manually, and when teams argue over whether an operation was actually executed or merely requested. For the product owner and project manager, the practical criterion is simple: decide whether a given data exchange is “question and answer, here and now” or rather “record the fact and ensure it is handled further even under disruption.” That answer determines not only the technology, but also the responsibility model between teams.
In practice, this is easy to see in machine systems where a single operator action or process event must be recorded, passed on, and confirmed in several places. If a supervisory application sends synchronous requests to successive services and waits for the full set of responses, a temporary issue in one element can stop the entire sequence, even though some business effects should occur independently. By contrast, a broker or queue makes it possible to separate the moment information is accepted from the moment it is processed, preserve the event trail, and manage retries after an error more easily. That does not mean event-driven communication is always better: if an immediate decision is needed to block further machine movement, or the operator must receive a definitive response straight away, an asynchronous model without well-designed intermediate states can increase uncertainty. That is why, from the start of the project, it is worth measuring not only response time, but also the number of lost or duplicated messages, the time needed to reconcile inconsistent states, and the ability to reconstruct the sequence of events after an incident.
This naturally connects with risk assessment according to ISO 12100 in industrial projects, because the choice of communication method affects the consequences of an error, the detectability of irregularities, and the ability to implement effective risk reduction measures. If the interface carries functions whose incorrect execution could lead to unintended start-up, a dangerous change of state, or loss of control over energy, the issue stops being purely an IT matter and moves into machine system design and the assessment of protective measures. This is also the point at which related issues must be considered, including hazard identification according to ISO 12100 and practical risk assessment using the adopted methodology. In other words, the decision on REST, a queue, or a broker should be made not after the integration demo, but when the team can clearly define the consequences of an incorrect or delayed message for the process, safety, and accountability.
Where cost or risk most often increases
Most poor decisions do not come from choosing the “wrong technology,” but from assigning a REST API to tasks it was not designed for. In industry, costs rise when a request-response interface is expected to carry communication that is sensitive to temporary unavailability, event order, or the need for reliable confirmation of execution. If the system only needs to read the current state of a device or accept a command whose absence can be easily detected and retried without side effects, REST may be sufficient. But if the outcome depends on whether the message arrived exactly once, in the correct order, and with the ability to reconstruct the history after an incident, the cost of working around REST’s limitations quickly outweighs the apparent simplicity of implementation. In practice, that means extra retry logic, custom buffering mechanisms, reconciliation of inconsistent states, and more difficult accountability when a device performs an operation later than expected or performs it twice.
At the design stage, the issue usually seems harmless: the team assumes a stable network, constant service availability, and a clearly defined state on both sides of the integration. In an industrial environment, those assumptions rarely hold for long. Connectivity drops, a device restarts, an intermediate system is updated, or there is simply overload during a production shift change. At that point, an architecture based solely on synchronous calls starts shifting risk onto applications and operators. Project costs rise not only because of software fixes, but also because of recovery testing, additional operating procedures, and disputes over which side “should have known” that the request had not been executed. The practical decision criterion is simple: if receiver unavailability must not stop the sender, and the message has to be safely retained and handled later, then a queue, broker, or event-driven communication model should be seriously considered instead of pure REST.
A good example is the integration of a supervisory system with a line where one system orders a recipe change and several others must accept it, confirm it, and apply it at the correct point in the cycle. With REST, it is easy to build a “set parameters” call, but harder to ensure that all relevant components received the same information, that an older message will not overwrite a newer one, and that after a failure it will still be possible to determine who saw which command. An event broker or queue structures this problem differently: the message becomes a durable fact in the system, one that can be traced, reprocessed, and independently consumed by multiple recipients. This is not just a technical choice. It determines whether, in the event of a batch complaint, downtime, or an incident, the course of the system’s decisions can be demonstrated and responsibility assigned accordingly, whether contractual, operational, or internal. Where accountability matters, it is worth measuring not only response latency, but also the number of messages requiring retry, the time needed to reconcile state after a failure, and the ability to reconstruct the event sequence without manually piecing it together from multiple logs.
This becomes a risk assessment issue when an incorrect or delayed message can change the behavior of a machine, process, or protective measure. In that case, the question is no longer just about integration convenience; the impact, detectability, and ability to limit the consequences must be assessed, which is consistent with risk analysis according to EN ISO 12100. If the communication concerns safety-related functions, interlocks, start conditions, or confirmation of energy status, the boundary of design responsibility shifts from the application level to the level of the machine system as a whole. The same applies to actuating systems, including hydraulic ones: an incorrect assumption that information will be delivered on time may conflict with the principles for designing protective measures and safe states, which naturally leads to issues associated with EN ISO 4413. In other words, queues and brokers are not “better by definition,” but they become the right choice where the design must withstand communication failures without losing control, history, and accountability for the actions performed.
How to approach this in practice
In practice, the question is not whether a REST API is good or bad, but whether it fits the consequences of an error, delay, or missing response in a given industrial process. If the communication is used mainly for reading data, initiating administrative actions, or integrating with business systems, a request-response interface may be the simplest and least expensive solution. The problem starts when the design assumes continuity of information exchange despite temporary unavailability on one side, the need for ordered event processing, or the obligation to reconstruct who triggered a specific state change, when, and on what basis. In that setup, choosing REST as the default mechanism often lowers the entry cost but increases the cost of handling failures, reconciling state after an interruption, and explaining an incident. That is the point at which queues, brokers, and event-driven communication stop being an “architectural extra” and become a tool for reducing design risk and operational liability.
For the team and the manager, this means making an architectural decision based on several measurable characteristics of the process, not on the contractor’s preference. The most useful criterion is simple: it must be decided what should happen to a message when the recipient does not respond at the moment it is sent. If the correct answer is “nothing critical, the operation can safely be retried or discarded,” REST is usually sufficient. If, however, the message must be retained, delivered after operations resume, processed only once, or handled in an order that can be demonstrated, then a synchronous architecture starts to diverge from the process requirements. It is worth recording this already at the assumptions stage as acceptance criteria: the permissible period of partner unavailability, the retry method, deduplication rules, the ability to trace event correlation, and the method for restoring state after a failure. Without such arrangements, the project may appear to move faster at first, but later return in the form of costly integration changes, disputes over the scope of responsibility, and operational nonconformities that are difficult to close.
A good example is a line or cell where the supervisory system sends orders and the controllers and workstations report completion, rejects, interlocks, or transitions between operating modes. If every event is immediately polled via REST, even a brief loss of connectivity quickly creates a mismatch between the actual state and the state shown in the application. From a production perspective, this ends in manual reconciliation; from a quality perspective, it creates a gap in the batch history; and from a maintenance perspective, it leaves uncertainty as to whether a given command was executed or merely sent. A broker with persistent message storage does not solve everything, but it does clarify responsibility: the sender published the event, the intermediary system retained it, and the receiver either acknowledged it or did not. That is a fundamental difference when analysing the causes of downtime and determining whether the error resulted from process logic, a network failure, or an incorrect sequence of operator actions. That is why the choice of communication model affects not only implementation cost, but also commissioning, service, and audit time.
This becomes a matter of practical risk assessment according to ISO 12100 when a message is no longer just information, but a condition for machine operation, process execution, or a protective measure. If the ability to start, resume after a stop, release a sequence, or confirm a safe energy state depends on correct transmission of status, then the integration decision becomes part of a more consequential design decision. In that case, it is necessary to assess not only communication availability, but also the effects of loss, delay, duplication, and misinterpretation; this is where the methodology known from ISO 12100 naturally comes into play. Conversely, where communication affects the conditions for preventing unexpected start-up, the information layer must not be treated as a substitute for solutions intended for energy isolation and a safe state. This is the boundary where the topic already intersects with hazard identification in accordance with ISO 12100 and, more broadly, with machine system design beyond functionality alone. In other words, REST is suitable for industry when its limitations are consciously accepted by the process; where they are not, queuing and event-driven communication mechanisms are more appropriate because they better address continuity, accountability, and control of failure consequences.
What to watch for during implementation
The most common implementation mistake is treating the choice between a REST API and event-driven communication as a purely technical decision, when in industry it is a decision with operational and organisational consequences. REST does not stop working simply because it is used on the shop floor, but its limitations become apparent very quickly wherever the system must absorb connectivity interruptions, uneven loads, temporary service unavailability, and the need to reconstruct the sequence of events afterwards. If the architecture assumes that every response must arrive immediately and on the first attempt, the design becomes fragile. The outcome is usually predictable: integration costs rise, workarounds multiply, and responsibility for an incorrect process state becomes blurred across system suppliers. Queues and brokers, in turn, do not solve the problem automatically; they introduce risks of their own, such as delayed processing, duplicate messages, the need to restore sequence order, and more complex monitoring. So the question is not whether REST is always suitable for industry, but whether a given process can tolerate the characteristics of this form of communication without shifting risk onto production, maintenance, and compliance.
At the design stage, it is worth adopting a simple assessment criterion: what exactly happens to the process if a message does not arrive, arrives twice, or arrives too late. If the only consequence is delayed data refresh in a reporting system, REST may be sufficient. However, if the lack of a response blocks a sequence, forces manual intervention, leads to loss of the execution history of an operation, or makes it harder to determine who made a decision and on what basis, then a synchronous architecture starts generating cost already at the commissioning stage. In that situation, communication based on a queue or broker usually provides clearer responsibility: the sender confirms handoff, the receiver processes at its own pace, and the team can observe backlogs, retries, and errors. For the project manager, this means measuring not only service availability, but also indicators such as message dwell time, retry rate, the number of unaccounted-for messages, and the time needed to reconstruct the event history after an incident.
In practice, the problem becomes especially visible where a single integration starts performing several roles at once. For example, the supervisory system sends an order to a workstation, receives confirmation of execution, and at the same time records a status that conditions further line start-up. As long as we are talking about the exchange of business data, a delay of a few seconds may be acceptable. However, if that same communication path starts influencing an execution decision in the process, it ceases to be a neutral IT add-on. At that point, choosing the wrong mechanism affects not only the cost of downtime, but also responsibility for whether the system responds predictably to loss of connectivity, a service restart, or a duplicate message. This is the point at which the topic naturally moves into the area of machine system design beyond functionality alone: it is necessary to decide which failure effects may be tolerated and which must be isolated from the integration layer.
The boundary becomes even more important when communication starts to affect conditions related to functional safety or risk assessment. If meeting a safe-state condition, authorizing a restart, confirming the absence of hazardous energy, or any other protective function depends on correct data exchange, good integration practice alone is not enough. At that point, it must be clearly determined whether the component in question remains purely an information layer or already falls within the design scope of control system parts responsible for safety functions. This is where the relevant questions from EN ISO 13849-1 and practical risk assessment according to ISO 12100 come into play, but only after the function and the consequences of failure have been defined. For the team, this means separating what can be handled by a queue, broker, or REST from what must not rely solely on general-purpose communication. If that boundary is not set at the outset, the cost returns later in the form of design changes, acceptance disputes, and decision-making responsibility that is difficult to defend.
Is REST API always suitable for industrial applications? When is it better to use queues, brokers, and event-driven communication?
No. REST is well suited to a simple request-response model, but it is less effective when a message must survive a temporary recipient outage or be processed later.
Use it when you need a current status readout or an explicit call with an immediate response. It also works well where a failure to execute is easy to detect and can be safely retried without side effects.
When the sender cannot wait for the recipient, and the message must be retained and processed despite disruptions. This is also important when a single event is meant to trigger effects in several independent systems.
Problems are growing with retries, reconciling inconsistent states, and reconstructing the history after an incident. In practice, a temporary outage of a single component can block the entire sequence of actions.
No. If an immediate, binding answer is required, or a decision that blocks further machine movement, an asynchronous model may increase uncertainty if intermediate states have not been designed properly.