Surging on Centrifugal Compressor: When Big Problems Start from Small Modifications
Introduction
Chronic issues with the Recycle Nitrogen Compressor and Booster Expander in the Air Separation Plant unit have been present since commissioning, triggered by installation modifications initially intended to provide expander speed information to the control room for operational purposes, but ultimately leading to long-term technical impacts.
It took nearly ten years—from when I started working in this industry in 2007 until the problem was finally resolved in 2017—to truly understand and solve this issue. During that time, and certainly before, various efforts were made, but most only addressed the symptoms—not the root cause. From my perspective, this article records a long journey—from initial suspicions, field experiments, to the resolution that finally restored the system to its original design logic. Please feel free to provide counterarguments if you feel there are inaccuracies.
The Beginning of the Problem
From the start, this problem seemed to be understood with a flawed premise: almost all attention was focused on the recycle nitrogen compressor, as if it was always the culprit. However, from the beginning, I suspected something was amiss—especially in the shutdown sequence between the recycle nitrogen compressor and booster expander.
DCS showed that the recycle nitrogen compressor tripped first, but I doubted the accuracy of this data. There was a hardwired interlock that allowed the recycle nitrogen compressor to trip upon request from the booster expander, while DCS—which processes hundreds of I/Os—could experience latency in recording the sequence of events, especially those occurring within milliseconds. This is because DCS performs a cyclic scan of I/Os sequentially, whereas trip signals can appear asynchronously, either ahead of or behind the scan cycle.
To ensure the actual shutdown sequence, I assembled a Siemens S7 PLC as a high-speed event logger. High resolution was the only way to objectively prove who actually tripped first. The test results were consistent: 100% of events showed that the booster tripped first. This was caused by the sudden opening of the booster bypass, which triggered surging on the recycle nitrogen compressor, eventually causing the recycle nitrogen compressor to trip as well through interlock commands. Of course, not everyone immediately agreed with this conclusion—because understanding how DCS and PLC work requires deep technical expertise.
Click to reveal for Ladder Logic and PLC Symbol TableLadder Logic – Fair Trip Detection
The ladder logic below implements a deterministic trip detection system using rising edge recognition and memory flags.
It ensures that whichever input (recycle nitrogen compressor or booster expander) triggers a trip first is recorded objectively—regardless of PLC scan order.
// Detect rising edge of input C60
Rung 1:
|----[ I0.0 ]----[/ M0.3 ]----------------( M0.1 ) // C60 pulse
Rung 2:
|----[ I0.0 ]-----------------------------( M0.3 ) // Save C60 state
// Detect rising edge of input CD10
Rung 3:
|----[ I0.1 ]----[/ M0.4 ]----------------( M0.2 ) // CD10 pulse
Rung 4:
|----[ I0.1 ]-----------------------------( M0.4 ) // Save CD10 state
// Status outputs
Rung 5:
|----[ I0.0 ]-----------------------------( Q0.2 ) // C60 has tripped
Rung 6:
|----[ I0.1 ]-----------------------------( Q0.3 ) // CD10 has tripped
// Determine who tripped first
Rung 7:
|----[ M0.1 ]----[/ M0.0 ]----------------( Q0.0 ) // C60 tripped first
| ----( M0.0 ) // Lock trip detection
Rung 8:
|----[ M0.2 ]----[/ M0.0 ]----------------( Q0.1 ) // CD10 tripped first
| ----( M0.0 ) // Lock trip detection
// Manual reset
Rung 9:
|----[ I0.2 ]---------------------------[RST M0.0] // Reset trip flag
| [RST Q0.0]
| [RST Q0.1]
| [RST Q0.2]
| [RST Q0.3]
Symbol | Type | Function Description |
---|---|---|
I0.0 |
Digital Input | Trip signal from compressor C60 |
I0.1 |
Digital Input | Trip signal from booster expander CD10 |
I0.2 |
Digital Input | Manual reset button to clear all trip flags |
Q0.0 |
Digital Output | Indicates C60 tripped first |
Q0.1 |
Digital Output | Indicates CD10 tripped first |
Q0.2 |
Digital Output | Status output: C60 has tripped |
Q0.3 |
Digital Output | Status output: CD10 has tripped |
M0.0 |
Memory Bit | Trip detection flag—only one trip allowed per scan |
M0.1 |
Memory Bit | Rising edge detected on C60 |
M0.2 |
Memory Bit | Rising edge detected on CD10 |
M0.3 |
Memory Bit | Previous input state of C60 (for edge detection) |
M0.4 |
Memory Bit | Previous input state of CD10 (for edge detection) |
Since commissioning, to reduce the risk of surging, the recycle nitrogen compressor has never been operated at maximum performance. Its bypass tends to remain open, and to maintain operational stability, operators developed a daily routine: reducing expander speed when ambient temperatures began to rise in the morning, and increasing it again when the night was cooler—an art of survival between the safe design limits and the ever-changing operational realities. This strategy was quite effective: cooler nights were utilized to boost booster performance. Meanwhile, hot days were kept stable—because at high temperatures, the risk of surging and tripping was much greater.
However, operating a centrifugal compressor with a bypass that is not fully closed made the entire plant very noisy. This was a costly performance compromise—eroding energy efficiency while reducing work comfort due to constant noise.
Experiments and Investigations
The dynamic operational strategy applied based on environmental condition assessments did not always yield the expected results. When environmental assessments were inaccurate—too early or too late in responding to changes—adjustments to expander speed repeatedly failed to prevent surging or tripping in the system.
This problem worsened at the end of 2016, despite various improvements previously made, including replacing several control components and sensors. These efforts turned out to only address the symptoms, not the root cause. The situation was like dealing with a disease with an unclear diagnosis, then trying various medications that never truly cured it.
Management then decided to conduct a total overhaul of both machines—recycle nitrogen compressor and booster expander. However, due to personnel limitations from the vendor, only the recycle nitrogen compressor was worked on.
When testing the operation of the recycle nitrogen compressor without the booster expander, the machine was able to run at maximum performance without any issues. This result further strengthened my belief that the recycle nitrogen compressor was not the root cause of the problems that had been occurring. Just a few days after the plant resumed full operation, surging and tripping occurred again—bringing us back to the initial confusion.
I then decided to form a small team, consisting of selected control, instrumentation, and mechanical technicians—only those who were allowed to be involved and they only received orders from me. Together, we conducted a series of intensive investigations:
- Recording local panels using real-time video alternately, although very uncomfortable
- Remodeling trip signals, after we obtained valuable recordings from the time of the disturbance
- Evaluating 4–20 mA signal wiring, which turned out to be loaded with three devices, not two as it should have been
- Simulating assumptions, and we successfully reconstructed the root cause with precision!
The most crucial finding came from one thing that had been overlooked from the start: the analog signal actually came from a 5V DC voltage source, not 24V DC as I had assumed. This was not just a numerical difference—but a fundamental difference in power capacity and tolerance to load. I discovered this when I brought home and read the old speed controller document.
Furthermore, the output was actually loaded with three devices, whereas in the original design it was only designed for two. The combination of low voltage and excessive load caused significant signal distortion. The higher the operating speed, the greater the current drawn, and the voltage began to drop due to voltage drop at each load.
At a certain point, the 5V DC source could no longer maintain the minimum voltage level, and the analog signal suddenly plummeted. This is where the peak problem emerged—the anti-surge controller on the booster expander read this condition as a serious disturbance, and responded aggressively even though it was not actually necessary.
import numpy as np
import matplotlib.pyplot as plt
# Current range (mA)
current_mA = np.linspace(4, 20, 1000)
# Convert to RPM (0–45000 RPM)
rpm = (current_mA - 4) / 16 * 45000
# Load resistances (ohms)
resistors = [125, 250, 350]
colors = ['tab:blue', 'tab:orange', 'tab:red']
# Output voltage for each resistance
voltages = {R: current_mA / 1000 * R for R in resistors}
plt.figure(figsize=(10, 6))
for i, R in enumerate(resistors):
voltage = voltages[R]
label = f'{R} Ω Load' if R < 350 else f'{R} Ω Load (design violation)'
linestyle = '-' if R < 350 else '--' # Dashed line indicates thermal shift or excessive load
plt.plot(rpm, voltage, label=label, color=colors[i], linestyle=linestyle)
# Find where voltage reaches 5V
idx_5 = np.argmin(np.abs(voltage - 5))
rpm_at_5V = rpm[idx_5]
plt.axvline(rpm_at_5V, color=colors[i], linestyle=':', alpha=0.6)
plt.text(rpm_at_5V, 5.3, f'{int(rpm_at_5V)} RPM', rotation=90,
verticalalignment='bottom', horizontalalignment='right',
fontsize=8, color=colors[i])
# Find where voltage reaches 4.8V
idx_4 = np.argmin(np.abs(voltage - 4.8))
rpm_at_4 = rpm[idx_4]
plt.axvline(rpm_at_4, color=colors[2], linestyle=':', alpha=0.6)
plt.text(rpm_at_4, 5.3, f'{int(rpm_at_4)} RPM', rotation=90,
verticalalignment='bottom', horizontalalignment='right',
fontsize=8, color=colors[2])
# Compliance limit line
plt.axhline(5, color='black', linestyle='--', label='Maximum Voltage (5 V)')
# Warning voltage line
plt.axhline(4.8, color='red', linestyle='--', label='Voltage Drop Due to Load (4.8 V)')
# Labels and titles
plt.title('Analog Speed Signal Distortion Due to Voltage Drop')
plt.xlabel('Expander Speed (RPM)')
plt.ylabel('Analog Voltage (V)')
plt.grid(True)
plt.legend()
plt.xlim(0, 45000)
plt.ylim(0, 7)
plt.tight_layout()
# Export as SVG
plt.savefig("Voltage_vs_Speed.svg", format='svg')
plt.show()
Finally, it became clear: the initial circuit modification—which was intended only to display information in the control room—accidentally created measurement anomalies. This small disturbance, which for years was hidden behind assumptions and routines, turned out to be the main cause of complex and costly systemic disturbances.
Small Modifications, Big Results
After successfully reconstructing the root cause, I prepared a brief presentation to explain what actually happened, along with a very simple solution: returning the circuit to its original design. I was confident this solution was safe and appropriate, as it did not involve changes to logic or devices, but merely corrected the proven faulty circuit modification.
With full confidence, I presented this proposal in a management meeting. However, what happened was beyond expectations: I was not allowed to execute, and was directed to wait for vendor technicians to come for clarification and repairs. Yet, for years, neither the vendor nor others had ever successfully resolved this issue completely.
I tried again, this time through a personal approach after the meeting, to explain the situation technically. But the result was the same: no room to act.
At this point, I began to feel that the obstacles I faced were no longer technical, but political. An irony, when deep understanding of the problem was not the basis of trust.
Because vendor support remained uncertain, I had already decided in my heart: whenever the plant tripped again, the solution I had prepared would be immediately executed. And of course, that day truly came. Still following procedures, I asked for permission from my direct superior before acting. Fortunately, the speed controller I had recently replaced had two output channels—eliminating the need for the previously planned signal splitter. With a slight program modification, I set both to output identical signals—one for the anti-surge controller, and one shared for local display and signals to DCS. The circuit modification became much simpler—and safer.
With just a 30 cm cable and one small screwdriver, I reconnected the analog output that had been the source of the hidden problem. After the repair, the booster expander's anti-surge controller was retested—and the results were undeniable: the system fully recovered, without anomalies.
"Sometimes, major repairs start from things that seem trivial—as long as we are persistent enough to find them and do not assume a problem is eternal."
Direct and Indirect Effects
This small modification brought significant impacts on plant performance, both technically and economically:
- The recycle nitrogen compressor can now operate with its bypass fully closed, allowing the system to run close to design performance. Nitrogen no longer circulates aimlessly, while also reducing the load on the cooling system.
- The centrifugal compressor became silent, no longer producing the characteristic whistle sound from bypass flow—a silence that signifies efficiency.
- Energy consumption—and as a result, electricity bills—have dropped drastically, as ineffective nitrogen cycles were eliminated, and the need for machine restarts was greatly reduced.
- LIN and LAR production increased, as processes ran more stably and optimally according to the original design.
Of course, these changes were not immediately felt. It took time to dismantle operational patterns that had been formed over years and build confidence in the changes that had been made, requiring further changes to the process.
At that time, I did not yet have the ability to independently calculate plant capacity. But according to information, this modification was said to have a tangible impact:
Savings of over 1 billion rupiah per month A combination of reduced electricity consumption and increased production output.
The exact figures can be debated. But the compressor's silence—no longer screaming—that is the most tangible proof that change has occurred.
Cultural and Perception Challenges
The biggest challenge in solving problems is often not technical, but work culture and decision-making structures. In complex systems that have been operating for a long time, it is not uncommon for no single individual to truly understand the problem completely. Understanding is scattered and fragmented—some lies with engineers, some with operators, and some hidden in documentation that is rarely revisited.
In such conditions, the direction of troubleshooting is more often determined by who has authority, not by who understands the problem best. Solutions are often shaped by intuition, habits, or pressure to quickly “restore operations”—not by deep diagnostic results.
Not because the system is deliberately left to deviate from the design, but because efforts to find the root cause repeatedly hit dead ends. When various approaches fail to yield results, the organization eventually chooses the path considered safest: sticking to a stable configuration even if it deviates. As long as the system is still running and no fatal damage occurs, this condition is gradually accepted as the new normal.
But when disturbances become more frequent, routine handling is no longer sufficient. Courage is needed to propose a counter-narrative—opening the possibility that the root cause has not been truly addressed, that there is something that has been overlooked in our collective understanding.
True change usually only has room when the pressure is high enough, the data is strong enough, and the frustration is deep enough. At this point, a new approach finally has a chance to be tried—and real improvements can begin to happen.
Closing
This is not merely about repairing circuits or control logic, but an effort to reorganize trust in the original design—and more than that, the courage to correct a system that has long been considered final.
I even thought to myself, “If the output of the new speed controller had just happened to be 12V or 24V, maybe the issue would’ve resolved itself.” But then again, I might never have discovered the real root cause.
After two decades of operating in a deviated condition, this system finally works as it should. Not because the tools changed, but because our understanding of them changed. A long process that brought invaluable technical and managerial lessons.
Author's Note
This writing is not just about technical matters, but also about change management, historical analysis, and resilience in facing resistance.
Muscle mass isn't everything — neurons fire faster. 🧠⚡
And if all else fails...