BLE Service Discovery on Android: Caching, Timeouts, and Retry Strategies

If you have spent any meaningful time building BLE-connected apps on Android, you already know that discoverServices() is where things fall apart. The connection might succeed. The GATT callback might fire. But service discovery is the step where Android's Bluetooth stack reveals its worst habits: silent failures, stale caches, and OEM-specific quirks that no documentation warns you about.

At our Toronto office, we build BLE companion apps for neurotechnology devices used in clinical and research settings. For clients like RE-AK Technologies and CLEIO, a failed service discovery does not just mean a bad user experience. It means lost biosignal data, interrupted recording sessions, and researchers who stop trusting the app. Over dozens of production BLE apps shipped across Canada, we have catalogued the most painful service discovery issues and built reliable patterns to handle them.

This article is a technical deep dive into everything that can go wrong with BLE service discovery on Android, and how to build retry logic that actually works in production.

How discoverServices() Works Under the Hood

When you call BluetoothGatt.discoverServices(), the Android Bluetooth stack sends an ATT Find By Type Value request to the peripheral, asking it to enumerate all primary services, their characteristics, and descriptors. The peripheral responds with a series of ATT packets, and the Android stack assembles these into BluetoothGattService objects.

The callback you receive is onServicesDiscovered(gatt: BluetoothGatt, status: Int). Seems simple enough. But there are several things the documentation does not tell you:

Here is the minimal correct pattern for initiating service discovery:

private val scope = CoroutineScope(Dispatchers.Main + SupervisorJob())
private var discoveryTimeoutJob: Job? = null

private fun startServiceDiscovery(gatt: BluetoothGatt) {
    discoveryTimeoutJob = scope.launch {
        delay(SERVICE_DISCOVERY_TIMEOUT_MS)
        Log.e(TAG, "Service discovery timed out")
        handleDiscoveryFailure(gatt, reason = "timeout")
    }

    val started = gatt.discoverServices()
    if (!started) {
        discoveryTimeoutJob?.cancel()
        Log.e(TAG, "discoverServices() returned false, retrying...")
        scheduleRetry(gatt, attempt = 1)
    }
}

companion object {
    const val SERVICE_DISCOVERY_TIMEOUT_MS = 10_000L
}

The 10-second timeout is intentionally generous. On some devices, especially older Samsung phones running Android 10, service discovery can legitimately take 5 to 7 seconds when connecting to a peripheral with many characteristics.

The 600ms Delay: Why Immediate Discovery Fails

One of the most well-known (and poorly documented) issues in Android BLE development is that calling discoverServices() immediately after receiving onConnectionStateChange with STATE_CONNECTED will fail on a significant percentage of devices. The failure mode varies: on some phones it returns false, on others it triggers onServicesDiscovered with status 129 (GATT_INTERNAL_ERROR), and on others it simply never calls back.

The root cause is a race condition in the Android Bluetooth stack. When the connection event fires, the underlying L2CAP channel may not be fully established. The GATT client is technically connected but not yet ready to send ATT requests.

The widely adopted workaround is to introduce a delay of 600 milliseconds before calling discoverServices():

override fun onConnectionStateChange(
    gatt: BluetoothGatt,
    status: Int,
    newState: Int
) {
    if (newState == BluetoothProfile.STATE_CONNECTED) {
        Log.d(TAG, "Connected, scheduling service discovery")

        // Critical: delay discovery to let the stack stabilize
        handler.postDelayed({
            if (gatt.device.bondState == BluetoothDevice.BOND_BONDED) {
                // Bonded devices may need a longer delay on Samsung
                handler.postDelayed({
                    startServiceDiscovery(gatt)
                }, 1600)
            } else {
                startServiceDiscovery(gatt)
            }
        }, 600)
    }
}

The 600ms value comes from the Nordic Semiconductor team's research and is used in their nRF Connect SDK. For bonded devices, we use a longer 1600ms delay because Samsung's Bluetooth stack performs additional key exchange steps after the connection event fires, and starting discovery too early during this window will produce status 133 (GATT_ERROR).

Is 600ms Always Enough?

No. In our production analytics across Canadian device deployments, we see the following failure rates with different delays:

We typically use 600ms as the baseline and rely on retry logic to catch the remaining 1-2%. Increasing the delay to 1000ms adds noticeable latency to the connection flow, which matters for user experience in consumer-facing apps.

Samsung's GATT Cache Problem

Samsung devices maintain an aggressive GATT cache that persists across connections, app restarts, and sometimes even device reboots. When a peripheral updates its GATT database (which happens during firmware updates, or when a device has configurable services), Samsung phones will return the old, cached service list instead of performing a fresh discovery.

The symptoms are subtle and infuriating. Your app connects, discovery succeeds with GATT_SUCCESS, and you get a service list. But the characteristics you expect are missing, or they have the wrong properties (read-only instead of notify-enabled, for example). Everything works fine on Pixel phones. Only Samsung users report problems.

There is no public API to clear the GATT cache. However, there is a hidden method that works reliably across Samsung devices from Android 8 through Android 14:

private fun refreshGattCache(gatt: BluetoothGatt): Boolean {
    return try {
        val refreshMethod = gatt.javaClass.getMethod("refresh")
        val result = refreshMethod.invoke(gatt) as Boolean
        Log.d(TAG, "GATT cache refresh result: $result")
        result
    } catch (e: Exception) {
        Log.w(TAG, "Failed to refresh GATT cache", e)
        false
    }
}

// Call this BEFORE discoverServices()
override fun onConnectionStateChange(
    gatt: BluetoothGatt,
    status: Int,
    newState: Int
) {
    if (newState == BluetoothProfile.STATE_CONNECTED) {
        handler.postDelayed({
            refreshGattCache(gatt)
            handler.postDelayed({
                startServiceDiscovery(gatt)
            }, 300)
        }, 600)
    }
}

A few important notes about refresh():

Detecting a Stale Cache

We implement a simple validation step after discovery completes. If your peripheral exposes a known set of services, check for them immediately:

override fun onServicesDiscovered(gatt: BluetoothGatt, status: Int) {
    discoveryTimeoutJob?.cancel()

    if (status != BluetoothGatt.GATT_SUCCESS) {
        handleDiscoveryFailure(gatt, reason = "status_$status")
        return
    }

    val services = gatt.services
    val hasExpectedService = services.any {
        it.uuid == EXPECTED_SERVICE_UUID
    }

    if (!hasExpectedService && services.isNotEmpty()) {
        // Likely a stale cache, refresh and retry
        Log.w(TAG, "Expected service not found, refreshing cache")
        refreshGattCache(gatt)
        scope.launch {
            delay(500)
            gatt.discoverServices()
        }
        return
    }

    // Proceed with normal flow
    onDiscoverySuccess(gatt, services)
}

GATT_SUCCESS Does Not Mean What You Think

Status code 0 (BluetoothGatt.GATT_SUCCESS) in the onServicesDiscovered callback means the ATT discovery procedure completed without a transport-level error. It does not guarantee that:

We have seen cases, particularly on Xiaomi and Huawei devices, where GATT_SUCCESS is returned with an empty service list. This tends to happen when the Bluetooth stack is under load (multiple BLE connections active) or when the phone's Bluetooth service has been recently restarted.

The other status codes you will encounter in production:

Build your callback handler to be explicit about each status:

override fun onServicesDiscovered(gatt: BluetoothGatt, status: Int) {
    discoveryTimeoutJob?.cancel()

    when (status) {
        BluetoothGatt.GATT_SUCCESS -> {
            val services = gatt.services
            if (services.isNullOrEmpty()) {
                handleDiscoveryFailure(gatt, "empty_services")
            } else {
                validateAndProceed(gatt, services)
            }
        }
        5, 8 -> {
            // Authentication/encryption required
            Log.w(TAG, "Discovery requires bonding, status: $status")
            initiateBonding(gatt.device)
        }
        129 -> {
            // Internal error, retry with delay
            Log.e(TAG, "GATT_INTERNAL_ERROR during discovery")
            scheduleRetry(gatt, attempt = 1, delayMs = 1000)
        }
        133 -> {
            // Generic error, disconnect and reconnect
            Log.e(TAG, "GATT_ERROR during discovery")
            gatt.close()
            scheduleReconnect(gatt.device)
        }
        else -> {
            Log.e(TAG, "Unknown discovery status: $status")
            handleDiscoveryFailure(gatt, "unknown_status_$status")
        }
    }
}

Retry with Exponential Backoff

A single retry attempt is not enough for production BLE apps. Network conditions change, the Bluetooth stack recovers from transient errors, and the peripheral might need a moment to stabilize after a firmware operation. But blind retries without backoff will hammer the Bluetooth stack and make things worse.

Here is the retry strategy we use across our neurotechnology BLE apps:

class ServiceDiscoveryManager(
    private val scope: CoroutineScope,
    private val maxRetries: Int = 3,
    private val baseDelayMs: Long = 1000
) {
    private var currentAttempt = 0
    private var timeoutJob: Job? = null

    fun startDiscovery(gatt: BluetoothGatt) {
        currentAttempt = 0
        attemptDiscovery(gatt)
    }

    private fun attemptDiscovery(gatt: BluetoothGatt) {
        val timeout = SERVICE_DISCOVERY_TIMEOUT_MS +
            (currentAttempt * 2000L)

        timeoutJob = scope.launch {
            delay(timeout)
            onAttemptFailed(gatt, "timeout")
        }

        val started = gatt.discoverServices()
        if (!started) {
            timeoutJob?.cancel()
            onAttemptFailed(gatt, "discoverServices_returned_false")
        }
    }

    fun onAttemptFailed(gatt: BluetoothGatt, reason: String) {
        timeoutJob?.cancel()
        currentAttempt++

        if (currentAttempt > maxRetries) {
            Log.e(TAG, "Service discovery failed after $maxRetries retries")
            listener?.onDiscoveryFailed(gatt, reason)
            return
        }

        val delayMs = calculateBackoff(currentAttempt)
        Log.w(TAG, "Discovery attempt $currentAttempt failed ($reason), " +
            "retrying in ${delayMs}ms")

        // Refresh cache on retry attempts 2+
        if (currentAttempt >= 2) {
            refreshGattCache(gatt)
        }

        scope.launch {
            delay(delayMs)
            attemptDiscovery(gatt)
        }
    }

    private fun calculateBackoff(attempt: Int): Long {
        val exponentialDelay = baseDelayMs * (1L shl (attempt - 1))
        val jitter = (0..500).random().toLong()
        return minOf(exponentialDelay + jitter, MAX_BACKOFF_MS)
    }

    companion object {
        const val SERVICE_DISCOVERY_TIMEOUT_MS = 10_000L
        const val MAX_BACKOFF_MS = 8_000L
    }
}

Key design decisions in this implementation:

Clearing Cache on Bond State Changes

When a device's bond state changes (bonding is created, removed, or re-established), the GATT cache frequently becomes invalid. This is particularly common during the following scenarios:

Register a BroadcastReceiver to listen for bond state changes and proactively clear the cache:

class BondStateReceiver(
    private val onBondStateChanged: (BluetoothDevice, Int, Int) -> Unit
) : BroadcastReceiver() {

    override fun onReceive(context: Context, intent: Intent) {
        if (intent.action != BluetoothDevice.ACTION_BOND_STATE_CHANGED) return

        val device = intent.getParcelableExtra<BluetoothDevice>(
            BluetoothDevice.EXTRA_DEVICE
        ) ?: return

        val previousState = intent.getIntExtra(
            BluetoothDevice.EXTRA_PREVIOUS_BOND_STATE,
            BluetoothDevice.BOND_NONE
        )
        val newState = intent.getIntExtra(
            BluetoothDevice.EXTRA_BOND_STATE,
            BluetoothDevice.BOND_NONE
        )

        Log.d(TAG, "Bond state changed: ${device.address} " +
            "$previousState -> $newState")

        onBondStateChanged(device, previousState, newState)
    }
}

// In your connection manager:
private val bondReceiver = BondStateReceiver { device, prevState, newState ->
    val gatt = activeConnections[device.address] ?: return@BondStateReceiver

    when {
        newState == BluetoothDevice.BOND_BONDED -> {
            // New bond established, refresh and re-discover
            refreshGattCache(gatt)
            handler.postDelayed({
                discoveryManager.startDiscovery(gatt)
            }, 1600)
        }
        prevState == BluetoothDevice.BOND_BONDED &&
        newState == BluetoothDevice.BOND_NONE -> {
            // Bond removed, clear cache and disconnect
            refreshGattCache(gatt)
            gatt.disconnect()
        }
    }
}

Register this receiver with an IntentFilter for BluetoothDevice.ACTION_BOND_STATE_CHANGED. Make sure to register it before initiating any connections, and unregister it when your connection manager is destroyed.

OEM-Specific Discovery Quirks

Beyond Samsung's caching behavior, several other OEM-specific issues affect service discovery. Here is a summary of what we have observed in production across the most common devices in the Canadian market:

Samsung (One UI / Android 10-14)

Google Pixel (Android 12-15)

Xiaomi / Redmi (MIUI)

Huawei (EMUI / HarmonyOS)

Putting It All Together: A Production-Ready Discovery Flow

Here is the complete flow we recommend for service discovery in production BLE applications. This pattern handles the timing issues, caching problems, and OEM quirks described throughout this article:

class BleConnectionManager(
    private val context: Context,
    private val scope: CoroutineScope = CoroutineScope(Dispatchers.Main + SupervisorJob())
) {
    private val discoveryManager = ServiceDiscoveryManager(scope)
    private val activeGatts = mutableMapOf<String, BluetoothGatt>()

    private val gattCallback = object : BluetoothGattCallback() {
        override fun onConnectionStateChange(
            gatt: BluetoothGatt, status: Int, newState: Int
        ) {
            when (newState) {
                BluetoothProfile.STATE_CONNECTED -> {
                    activeGatts[gatt.device.address] = gatt
                    val delay = if (gatt.device.bondState ==
                        BluetoothDevice.BOND_BONDED) 1600L else 600L

                    scope.launch {
                        delay(delay)
                        discoveryManager.startDiscovery(gatt)
                    }
                }
                BluetoothProfile.STATE_DISCONNECTED -> {
                    activeGatts.remove(gatt.device.address)
                    gatt.close()
                }
            }
        }

        override fun onServicesDiscovered(
            gatt: BluetoothGatt, status: Int
        ) {
            when (status) {
                BluetoothGatt.GATT_SUCCESS -> {
                    val services = gatt.services
                    if (services.isNullOrEmpty()) {
                        discoveryManager.onAttemptFailed(
                            gatt, "empty_services"
                        )
                    } else {
                        discoveryManager.onDiscoverySuccess(gatt)
                        onReady(gatt, services)
                    }
                }
                5, 8 -> initiateBonding(gatt.device)
                else -> discoveryManager.onAttemptFailed(
                    gatt, "status_$status"
                )
            }
        }
    }

    private fun onReady(
        gatt: BluetoothGatt,
        services: List<BluetoothGattService>
    ) {
        // Connection is fully ready for use
        Log.i(TAG, "Discovery complete: ${services.size} services")
        listener?.onDeviceReady(gatt)
    }
}

This is not the simplest possible code. But simple code does not survive contact with the Android BLE ecosystem. Every delay, every retry, and every cache refresh in this pattern exists because a real device, in a real user's hands, failed without it.

Monitoring and Diagnostics

Finally, instrument your service discovery flow. In production, you need to know which devices are failing, what status codes they produce, and how many retry attempts are needed before success. We log the following metrics for every connection:

This data lets you make informed decisions about your delay values, retry counts, and whether you need OEM-specific workarounds. Without it, you are debugging BLE issues in the dark.

Service discovery is one of the most fragile steps in the Android BLE connection lifecycle, but with the right combination of timing delays, cache management, and retry strategies, it becomes predictable. Build defensively, instrument thoroughly, and test on as many physical devices as you can get your hands on.

Struggling with BLE service discovery issues on Android? Talk to DEVSFLOW Neuro. We build BLE-connected mobile apps for neurotechnology and medical device companies across Canada.