How to resolve a service stuck in Removing in Rancher v1.6

In Rancher v1.6, sometimes a service can be stuck in a removing state.

All containers of this service were already deleted in the user interface. I verified this on the Docker hosts using “docker ps -a,” and yes, all container instances were correctly removed. But the service in Rancher was still stuck in removing.

Furthermore, in Admin -> Processes the service.remove processes (which seem to because of being stuck in that service removing in progress) never disappeared and were re-started every 2 minutes:

Although I’m not sure what caused this, the reason might be several actions happening on that particular service almost at the same time:

Resolution

As you can see, while I attempted a service rollback, another user deleted the same service at (almost) the same time. I wouldn’t be surprised if this has upset Rancher in such a way that the “delete” task happened faster than the “rollback,” causing the “rollback” to hiccup the system. The second “delete” attempt was to see if it would somehow “force” the removal, but it didn’t work. So far to the theory (only someone from Rancher could eventually confirm this or better give the real reason for what has happened), let’s solve this.

Because all attempts using the Rancher UI and API failed (the service stayed in removing state), I began my research and came across the following issues:

MySQL

##Find the open process
select * from process_instance WHERE end_time is NULL;

##Kill all open process - if there is quite a large amount
UPDATE process_instance SET exit_reason='DONE', end_time=NOW() WHERE end_time is NULL;
select * from process_instance where end_time is NULL and process_name = 'service.remove';
UPDATE service SET state = 'active' WHERE id = <<resource_id>>;
comments powered by Disqus