Building a Reliable BLE Command Queue on Android for Medical Devices
If you have spent any time working with Android's BLE stack, you have encountered the silent failures that happen when you issue two GATT operations at the same time. A characteristic write that silently drops. A read that returns stale data. A descriptor write for enabling notifications that never takes effect. The root cause is always the same: Android's BluetoothGatt can only process one operation at a time, and it provides no built-in queuing mechanism.
For medical device apps, this is not a minor inconvenience. It is a reliability problem that can compromise patient safety. When a clinician sends a "stop stimulation" command to a neurostimulation device and the command is silently dropped because a characteristic read was already in progress, the consequences are serious. Building BLE-connected apps for neurotechnology companies in Toronto and across Canada, our team at DEVSFLOW has made the command queue the foundational layer of every project. This article walks through how to build one that is production-ready.
Why GATT Operations Are Not Thread-Safe
The Android BLE documentation does not make this obvious, but BluetoothGatt is fundamentally a single-operation-at-a-time interface. When you call writeCharacteristic(), the write is dispatched to the Bluetooth stack asynchronously. The result arrives later in onCharacteristicWrite(). If you call readCharacteristic() before that callback fires, the behavior is undefined. On some devices the read silently fails. On others it corrupts the pending write. On a few, it triggers a GATT error (status 133) that forces a reconnection.
The Android source code confirms this. Internally, BluetoothGatt uses a single mDeviceBusy flag. If you call any GATT method while this flag is set, the method returns false. But most developers ignore the boolean return value, and even if you check it, there is no built-in mechanism to retry the operation later.
This problem affects every GATT operation:
readCharacteristic()andwriteCharacteristic()readDescriptor()andwriteDescriptor()(including enabling/disabling notifications)requestMtu()readRemoteRssi()requestConnectionPriority()(this one is technically fire-and-forget, but issuing it concurrently with other operations can cause instability on some chipsets)
The solution is a serial command queue that ensures only one GATT operation is in flight at any time, waits for the corresponding callback before dispatching the next operation, and handles timeouts and errors gracefully.
Designing a Serial Command Queue
The core data structure is straightforward: a FIFO queue of command objects, each representing a single GATT operation. A dispatcher pulls the next command from the queue, executes it, and waits for the callback before pulling the next one.
sealed class BleCommand {
abstract val priority: Int
abstract val timeoutMs: Long
abstract val maxRetries: Int
var retriesRemaining: Int = maxRetries
data class WriteCharacteristic(
val serviceUuid: UUID,
val characteristicUuid: UUID,
val value: ByteArray,
val writeType: Int = BluetoothGattCharacteristic.WRITE_TYPE_DEFAULT,
override val priority: Int = PRIORITY_NORMAL,
override val timeoutMs: Long = 5000L,
override val maxRetries: Int = 2
) : BleCommand()
data class ReadCharacteristic(
val serviceUuid: UUID,
val characteristicUuid: UUID,
override val priority: Int = PRIORITY_NORMAL,
override val timeoutMs: Long = 5000L,
override val maxRetries: Int = 2
) : BleCommand()
data class EnableNotification(
val serviceUuid: UUID,
val characteristicUuid: UUID,
val enable: Boolean = true,
override val priority: Int = PRIORITY_NORMAL,
override val timeoutMs: Long = 5000L,
override val maxRetries: Int = 3
) : BleCommand()
data class RequestMtu(
val mtu: Int,
override val priority: Int = PRIORITY_HIGH,
override val timeoutMs: Long = 5000L,
override val maxRetries: Int = 1
) : BleCommand()
data class Disconnect(
override val priority: Int = PRIORITY_CRITICAL,
override val timeoutMs: Long = 2000L,
override val maxRetries: Int = 0
) : BleCommand()
companion object {
const val PRIORITY_NORMAL = 0
const val PRIORITY_HIGH = 1
const val PRIORITY_CRITICAL = 2
}
}
The queue itself uses a PriorityBlockingQueue so that high-priority commands (like disconnect or emergency stop) jump ahead of pending normal-priority operations. The dispatcher runs on a dedicated thread to avoid blocking the main thread or the Binder thread where BLE callbacks arrive:
class BleCommandQueue(
private val gatt: BluetoothGatt,
private val commandExecutor: BleCommandExecutor
) {
private val queue = PriorityBlockingQueue<BleCommand>(
16,
compareByDescending { it.priority }
)
private val operationLock = Semaphore(1)
private val scope = CoroutineScope(
SupervisorJob() + Dispatchers.IO + CoroutineName("BleCommandQueue")
)
private var isRunning = true
init {
scope.launch {
while (isRunning) {
val command = withContext(Dispatchers.IO) {
queue.take() // Blocks until a command is available
}
executeWithRetry(command)
}
}
}
fun enqueue(command: BleCommand) {
if (!isRunning) {
Log.w(TAG, "Queue is shut down, rejecting command: $command")
return
}
command.retriesRemaining = command.maxRetries
queue.offer(command)
}
private suspend fun executeWithRetry(command: BleCommand) {
operationLock.acquire()
try {
val success = commandExecutor.execute(gatt, command)
if (!success && command.retriesRemaining > 0) {
command.retriesRemaining--
delay(100) // Brief delay before retry
queue.offer(command) // Re-enqueue for retry
}
} catch (e: TimeoutCancellationException) {
Log.e(TAG, "Command timed out: $command")
if (command.retriesRemaining > 0) {
command.retriesRemaining--
queue.offer(command)
}
} finally {
operationLock.release()
}
}
fun shutdown() {
isRunning = false
scope.cancel()
queue.clear()
}
companion object {
private const val TAG = "BleCommandQueue"
}
}
The Semaphore(1) guarantees that only one command executes at a time. The semaphore is acquired before dispatching a command and released only after the corresponding GATT callback arrives (or the command times out). This is the core invariant that prevents concurrent GATT operations.
Handling Write Confirmations
The command executor bridges between the queue and the actual GATT calls. For write operations, the critical design decision is how to wait for the onCharacteristicWrite() callback. Using a CompletableDeferred from Kotlin coroutines is the cleanest approach:
class BleCommandExecutor : BluetoothGattCallback() {
private var pendingCompletion: CompletableDeferred<Int>? = null
suspend fun execute(
gatt: BluetoothGatt,
command: BleCommand
): Boolean {
pendingCompletion = CompletableDeferred()
val dispatched = when (command) {
is BleCommand.WriteCharacteristic -> dispatchWrite(gatt, command)
is BleCommand.ReadCharacteristic -> dispatchRead(gatt, command)
is BleCommand.EnableNotification -> dispatchNotification(gatt, command)
is BleCommand.RequestMtu -> dispatchMtuRequest(gatt, command)
is BleCommand.Disconnect -> {
gatt.disconnect()
true
}
}
if (!dispatched) {
pendingCompletion = null
return false
}
// Wait for the callback with a timeout
return try {
val status = withTimeout(command.timeoutMs) {
pendingCompletion!!.await()
}
status == BluetoothGatt.GATT_SUCCESS
} catch (e: TimeoutCancellationException) {
pendingCompletion = null
throw e
}
}
private fun dispatchWrite(
gatt: BluetoothGatt,
command: BleCommand.WriteCharacteristic
): Boolean {
val service = gatt.getService(command.serviceUuid) ?: return false
val characteristic = service.getCharacteristic(
command.characteristicUuid
) ?: return false
return if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.TIRAMISU) {
val result = gatt.writeCharacteristic(
characteristic,
command.value,
command.writeType
)
result == BluetoothStatusCodes.SUCCESS
} else {
@Suppress("DEPRECATION")
characteristic.writeType = command.writeType
@Suppress("DEPRECATION")
characteristic.value = command.value
@Suppress("DEPRECATION")
gatt.writeCharacteristic(characteristic)
}
}
// GATT callbacks - these complete the pending operation
override fun onCharacteristicWrite(
gatt: BluetoothGatt,
characteristic: BluetoothGattCharacteristic,
status: Int
) {
pendingCompletion?.complete(status)
}
override fun onCharacteristicRead(
gatt: BluetoothGatt,
characteristic: BluetoothGattCharacteristic,
value: ByteArray,
status: Int
) {
pendingCompletion?.complete(status)
}
override fun onDescriptorWrite(
gatt: BluetoothGatt,
descriptor: BluetoothGattDescriptor,
status: Int
) {
pendingCompletion?.complete(status)
}
override fun onMtuChanged(
gatt: BluetoothGatt,
mtu: Int,
status: Int
) {
pendingCompletion?.complete(status)
}
}
A few details worth highlighting. First, the dispatchWrite() method handles both the new API (Android 13+/Tiramisu) and the deprecated API for older devices. You need both code paths in production. Second, the withTimeout wrapper ensures that a missing callback (which does happen on some devices) does not stall the queue forever. Third, the CompletableDeferred is created before the GATT call and completed in the callback, creating a clean async bridge between the two threads.
Read, Write, and Notify Ordering
The most common initialization sequence for a BLE medical device involves multiple operation types that must execute in a specific order. A typical sequence looks like this:
- Request MTU (to maximize payload size)
- Read device information characteristics (firmware version, serial number)
- Enable notifications on the data streaming characteristic
- Enable notifications on the status/battery characteristic
- Write a "start session" command to the control characteristic
With the command queue in place, this becomes straightforward:
class DeviceInitializer(private val queue: BleCommandQueue) {
fun initializeDevice(serviceUuid: UUID, controlUuid: UUID, dataUuid: UUID, statusUuid: UUID) {
// Step 1: Request MTU
queue.enqueue(BleCommand.RequestMtu(mtu = 247))
// Step 2: Read device info
queue.enqueue(BleCommand.ReadCharacteristic(
serviceUuid = serviceUuid,
characteristicUuid = DEVICE_INFO_UUID
))
// Step 3: Enable data notifications
queue.enqueue(BleCommand.EnableNotification(
serviceUuid = serviceUuid,
characteristicUuid = dataUuid
))
// Step 4: Enable status notifications
queue.enqueue(BleCommand.EnableNotification(
serviceUuid = serviceUuid,
characteristicUuid = statusUuid
))
// Step 5: Start session
queue.enqueue(BleCommand.WriteCharacteristic(
serviceUuid = serviceUuid,
characteristicUuid = controlUuid,
value = byteArrayOf(0x01) // START command
))
}
}
Because the queue is serial, the operations execute in exactly this order, and each waits for its callback before the next one starts. Without the queue, you would need nested callbacks or a complex state machine to achieve the same ordering guarantee.
One subtle point about enabling notifications: this involves two separate GATT operations. First, you call setCharacteristicNotification() on the local GATT client (this is a local-only operation and does not require queuing). Then, you write to the Client Characteristic Configuration Descriptor (CCCD) on the remote device (this is a GATT write and must go through the queue). The EnableNotification command in our queue encapsulates both steps:
private fun dispatchNotification(
gatt: BluetoothGatt,
command: BleCommand.EnableNotification
): Boolean {
val service = gatt.getService(command.serviceUuid) ?: return false
val characteristic = service.getCharacteristic(
command.characteristicUuid
) ?: return false
// Step 1: Local registration (not a GATT operation)
if (!gatt.setCharacteristicNotification(characteristic, command.enable)) {
return false
}
// Step 2: Write the CCCD descriptor (this IS a GATT operation)
val cccdUuid = UUID.fromString("00002902-0000-1000-8000-00805f9b34fb")
val descriptor = characteristic.getDescriptor(cccdUuid) ?: return false
val value = if (command.enable) {
BluetoothGattDescriptor.ENABLE_NOTIFICATION_VALUE
} else {
BluetoothGattDescriptor.DISABLE_NOTIFICATION_VALUE
}
return if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.TIRAMISU) {
val result = gatt.writeDescriptor(descriptor, value)
result == BluetoothStatusCodes.SUCCESS
} else {
@Suppress("DEPRECATION")
descriptor.value = value
@Suppress("DEPRECATION")
gatt.writeDescriptor(descriptor)
}
}
Timeout and Retry Logic
Timeouts and retries are where a basic command queue becomes a production-ready one. There are several failure modes that timeouts must handle:
- Missing callback: The GATT operation was dispatched but the callback never fires. This can happen when the device moves out of range mid-operation, or due to firmware bugs in the peripheral. Without a timeout, the queue stalls permanently.
- Status 133 (GATT_ERROR): The most common generic error on Android BLE. It can mean almost anything. The correct response is usually to retry once. If the retry also fails with 133, disconnect and reconnect.
- Status 5 (GATT_INSUFFICIENT_AUTHENTICATION): The device requires bonding. Retrying will not help; you need to initiate pairing first.
- Status 14 (GATT_BUSY): The GATT stack is overloaded. This should not happen if the queue is working correctly, but it can occur during reconnection if old operations from a previous connection are still pending internally.
A robust retry strategy differentiates between transient and permanent failures:
class RetryPolicy {
fun shouldRetry(command: BleCommand, status: Int): RetryDecision {
if (command.retriesRemaining <= 0) {
return RetryDecision.GiveUp(status)
}
return when (status) {
BluetoothGatt.GATT_SUCCESS -> RetryDecision.Success
// Transient errors - retry with backoff
133 -> RetryDecision.RetryAfter(delayMs = 200)
14 -> RetryDecision.RetryAfter(delayMs = 500)
// Connection-level errors - no point retrying individual command
8 -> RetryDecision.Reconnect // GATT_CONN_TIMEOUT
19 -> RetryDecision.Reconnect // GATT_CONN_TERMINATE_PEER_USER
22 -> RetryDecision.Reconnect // GATT_CONN_TERMINATE_LOCAL_HOST
// Permanent errors - retrying will not help
5 -> RetryDecision.PermanentFailure(status) // INSUFFICIENT_AUTHENTICATION
6 -> RetryDecision.PermanentFailure(status) // REQUEST_NOT_SUPPORTED
13 -> RetryDecision.PermanentFailure(status) // INVALID_ATTRIBUTE_LENGTH
// Unknown status - retry once, then give up
else -> RetryDecision.RetryAfter(delayMs = 100)
}
}
sealed class RetryDecision {
data object Success : RetryDecision()
data class RetryAfter(val delayMs: Long) : RetryDecision()
data object Reconnect : RetryDecision()
data class PermanentFailure(val status: Int) : RetryDecision()
data class GiveUp(val status: Int) : RetryDecision()
}
}
The timeout duration matters more than you might expect. Setting timeouts too short (under 2 seconds) causes false positives on slow devices or congested radio environments. Setting them too long (over 10 seconds) means a stalled queue takes too long to recover. For most medical device operations, 5 seconds is a reasonable default. MTU requests and initial service discovery may need longer timeouts (up to 10 seconds) because they involve more round-trips at the GATT level.
Priority Commands: Disconnect and Emergency Stop
For medical devices, some commands must bypass the normal queue and execute immediately. The two most important cases are disconnect (when the user or the app decides to end the session) and emergency stop (when the device is performing an action like electrical stimulation that must be halted immediately).
The priority system in our BleCommand class handles this. Critical-priority commands are dequeued before normal-priority commands. But there is a subtlety: even a critical command cannot execute while another command is in progress (the semaphore enforces this). For a true emergency stop, you may need to cancel the in-progress operation:
class BleCommandQueue(
private val gatt: BluetoothGatt,
private val commandExecutor: BleCommandExecutor
) {
// ... previous code ...
private var currentCommandJob: Job? = null
fun enqueueEmergencyStop(
serviceUuid: UUID,
controlUuid: UUID,
stopPayload: ByteArray
) {
// 1. Clear all pending non-critical commands
val retained = mutableListOf<BleCommand>()
queue.drainTo(retained)
retained.filter { it.priority >= BleCommand.PRIORITY_CRITICAL }
.forEach { queue.offer(it) }
// 2. Cancel the currently executing command
currentCommandJob?.cancel()
// Force-release the semaphore so the stop command can execute
if (operationLock.availablePermits() == 0) {
operationLock.release()
}
// 3. Enqueue the emergency stop at critical priority
queue.offer(BleCommand.WriteCharacteristic(
serviceUuid = serviceUuid,
characteristicUuid = controlUuid,
value = stopPayload,
priority = BleCommand.PRIORITY_CRITICAL,
timeoutMs = 2000L,
maxRetries = 3 // Try hard to deliver this
))
}
fun enqueueDisconnect() {
// Clear all pending commands - we are disconnecting
queue.clear()
// Cancel current operation
currentCommandJob?.cancel()
if (operationLock.availablePermits() == 0) {
operationLock.release()
}
queue.offer(BleCommand.Disconnect())
}
}
This is the one place where we break the strict serial guarantee. An emergency stop must preempt whatever is currently in progress. The tradeoff is that the interrupted command will fail (its CompletableDeferred gets cancelled). This is acceptable because the session is being terminated anyway.
For clients like RE-AK Technologies building safety-critical sensor systems, and teams at CLEIO working on regulated medical devices, we add additional safeguards: the emergency stop command is retried up to three times, and if all retries fail, the app triggers a full GATT disconnect as a last resort (which forces the peripheral to reset to its default state).
Testing the Queue Under Load
A command queue that works during normal operation but fails under load is worse than no queue at all, because it creates a false sense of reliability. Testing must cover several stress scenarios:
Rapid-fire command injection
Simulate a user rapidly toggling settings or a data pipeline that generates commands faster than BLE can process them:
@Test
fun `queue handles rapid command injection without dropping commands`() = runTest {
val completedCommands = mutableListOf<BleCommand>()
val mockExecutor = MockBleCommandExecutor(
onExecute = { command ->
delay(50) // Simulate GATT operation latency
completedCommands.add(command)
true
}
)
val queue = BleCommandQueue(mockGatt, mockExecutor)
// Enqueue 100 commands as fast as possible
repeat(100) { i ->
queue.enqueue(BleCommand.WriteCharacteristic(
serviceUuid = testServiceUuid,
characteristicUuid = testCharUuid,
value = byteArrayOf(i.toByte())
))
}
// Wait for all commands to complete
advanceUntilIdle()
// All 100 commands should have been executed in order
assertEquals(100, completedCommands.size)
completedCommands.forEachIndexed { index, command ->
val write = command as BleCommand.WriteCharacteristic
assertEquals(index.toByte(), write.value[0])
}
}
Timeout recovery
Verify that a timed-out command does not stall the queue:
@Test
fun `queue recovers after command timeout`() = runTest {
var callCount = 0
val mockExecutor = MockBleCommandExecutor(
onExecute = { command ->
callCount++
if (callCount == 1) {
// First command: simulate missing callback by never returning
delay(Long.MAX_VALUE)
}
true // Subsequent commands succeed
}
)
val queue = BleCommandQueue(mockGatt, mockExecutor)
// First command will timeout
queue.enqueue(BleCommand.WriteCharacteristic(
serviceUuid = testServiceUuid,
characteristicUuid = testCharUuid,
value = byteArrayOf(0x01),
timeoutMs = 1000L,
maxRetries = 0
))
// Second command should still execute after the timeout
queue.enqueue(BleCommand.WriteCharacteristic(
serviceUuid = testServiceUuid,
characteristicUuid = testCharUuid,
value = byteArrayOf(0x02)
))
advanceTimeBy(2000)
advanceUntilIdle()
// Both commands should have been attempted
assertTrue(callCount >= 2)
}
Priority preemption
Verify that high-priority commands execute before queued normal-priority commands:
@Test
fun `critical commands execute before normal commands`() = runTest {
val executionOrder = mutableListOf<Int>()
val mockExecutor = MockBleCommandExecutor(
onExecute = { command ->
executionOrder.add(command.priority)
delay(50)
true
}
)
val queue = BleCommandQueue(mockGatt, mockExecutor)
// Block the queue with a slow command
queue.enqueue(BleCommand.ReadCharacteristic(
serviceUuid = testServiceUuid,
characteristicUuid = testCharUuid,
priority = BleCommand.PRIORITY_NORMAL
))
// Enqueue several normal commands
repeat(5) {
queue.enqueue(BleCommand.WriteCharacteristic(
serviceUuid = testServiceUuid,
characteristicUuid = testCharUuid,
value = byteArrayOf(0x01),
priority = BleCommand.PRIORITY_NORMAL
))
}
// Enqueue a critical command
queue.enqueue(BleCommand.WriteCharacteristic(
serviceUuid = testServiceUuid,
characteristicUuid = testCharUuid,
value = byteArrayOf(0xFF.toByte()),
priority = BleCommand.PRIORITY_CRITICAL
))
advanceUntilIdle()
// After the first command completes, the critical command
// should execute before the remaining normal commands
val indexOfCritical = executionOrder.indexOf(BleCommand.PRIORITY_CRITICAL)
assertTrue(indexOfCritical <= 1) // Should be first or second
}
Disconnection during active queue
This is the most important stress test. When a BLE device disconnects unexpectedly, the queue must handle it gracefully. All pending commands should fail with a clear error, not hang indefinitely:
@Test
fun `queue clears gracefully on unexpected disconnection`() = runTest {
val failedCommands = mutableListOf<BleCommand>()
val mockExecutor = MockBleCommandExecutor(
onExecute = { command ->
delay(100)
true
},
onDisconnect = { command ->
failedCommands.add(command)
}
)
val queue = BleCommandQueue(mockGatt, mockExecutor)
// Enqueue several commands
repeat(10) {
queue.enqueue(BleCommand.WriteCharacteristic(
serviceUuid = testServiceUuid,
characteristicUuid = testCharUuid,
value = byteArrayOf(it.toByte())
))
}
// Simulate disconnection after 200ms
delay(200)
queue.onConnectionLost()
advanceUntilIdle()
// Remaining commands should have been failed, not executed
assertTrue(failedCommands.size > 0)
}
Production Considerations
Beyond the core queue implementation, there are several details that matter in production medical device apps:
Thread safety of the callback bridge. The pendingCompletion variable in the executor is accessed from two threads: the queue's coroutine (which sets it) and the Binder thread (which completes it). CompletableDeferred is thread-safe, but assigning the variable itself is not. Use @Volatile or an AtomicReference to prevent visibility issues.
Connection state changes. When onConnectionStateChange fires with STATE_DISCONNECTED, you must cancel all pending commands and clear the queue. Attempting to execute GATT operations on a disconnected BluetoothGatt object can cause crashes on some OEM implementations.
Service discovery caching. After calling discoverServices(), the service/characteristic/descriptor objects are cached in the BluetoothGatt object. If the device reconnects and the GATT database has changed (rare but possible with firmware updates), the cached objects become stale. Always re-discover services after reconnection.
Logging for regulatory compliance. Medical device apps in Canada and the US often need to demonstrate that every command sent to the device was acknowledged. The command queue is the natural place to add this logging. Log the command type, timestamp, GATT status, retry count, and latency for every operation. This data is invaluable during incident investigation and regulatory audits.
data class CommandLogEntry(
val timestamp: Long = System.currentTimeMillis(),
val commandType: String,
val characteristicUuid: String,
val gattStatus: Int,
val latencyMs: Long,
val retryCount: Int,
val success: Boolean
)
class CommandAuditLog(private val db: AppDatabase) {
suspend fun log(command: BleCommand, status: Int, latencyMs: Long) {
val entry = CommandLogEntry(
commandType = command::class.simpleName ?: "Unknown",
characteristicUuid = when (command) {
is BleCommand.WriteCharacteristic -> command.characteristicUuid.toString()
is BleCommand.ReadCharacteristic -> command.characteristicUuid.toString()
is BleCommand.EnableNotification -> command.characteristicUuid.toString()
else -> "N/A"
},
gattStatus = status,
latencyMs = latencyMs,
retryCount = command.maxRetries - command.retriesRemaining,
success = status == BluetoothGatt.GATT_SUCCESS
)
db.commandLogDao().insert(entry)
}
}
Queue depth monitoring. In production, monitor the queue depth. If the queue grows continuously, it means commands are being enqueued faster than they can be executed. This typically indicates a bug in the calling code (a tight loop generating commands) or a degraded BLE connection where every operation is timing out and retrying. Set up alerts when queue depth exceeds a threshold, and consider dropping low-priority commands when the queue is saturated.
The command queue is not glamorous code, but it is the difference between a BLE app that works reliably and one that fails in unpredictable ways. Every production BLE app on Android should have one. The investment in building and testing it properly pays off many times over in reduced debugging time, fewer customer support tickets, and the confidence that when your app sends a command to a medical device, that command will either be delivered or fail with a clear error.
Building a BLE-connected medical device app that needs to be rock-solid? Talk to DEVSFLOW Neuro. We build BLE-connected mobile apps for neurotechnology and medical device companies across Canada.