BLE Service Discovery on Android: Caching, Timeouts, and Retry Strategies
If you have spent any meaningful time building BLE-connected apps on Android, you already know that discoverServices() is where things fall apart. The connection might succeed. The GATT callback might fire. But service discovery is the step where Android's Bluetooth stack reveals its worst habits: silent failures, stale caches, and OEM-specific quirks that no documentation warns you about.
At our Toronto office, we build BLE companion apps for neurotechnology devices used in clinical and research settings. For clients like RE-AK Technologies and CLEIO, a failed service discovery does not just mean a bad user experience. It means lost biosignal data, interrupted recording sessions, and researchers who stop trusting the app. Over dozens of production BLE apps shipped across Canada, we have catalogued the most painful service discovery issues and built reliable patterns to handle them.
This article is a technical deep dive into everything that can go wrong with BLE service discovery on Android, and how to build retry logic that actually works in production.
How discoverServices() Works Under the Hood
When you call BluetoothGatt.discoverServices(), the Android Bluetooth stack sends an ATT Find By Type Value request to the peripheral, asking it to enumerate all primary services, their characteristics, and descriptors. The peripheral responds with a series of ATT packets, and the Android stack assembles these into BluetoothGattService objects.
The callback you receive is onServicesDiscovered(gatt: BluetoothGatt, status: Int). Seems simple enough. But there are several things the documentation does not tell you:
- The method returns a boolean. If
discoverServices()returnsfalse, no callback will ever fire. Your app will wait forever unless you implement your own timeout. - A status of
BluetoothGatt.GATT_SUCCESSdoes not mean all services were discovered. It means the discovery procedure completed without a protocol-level error. The service list can still be empty or incomplete. - The call is asynchronous but not thread-safe. If you call
discoverServices()while another GATT operation is pending (a read, write, or MTU negotiation), it will silently returnfalseon many devices.
Here is the minimal correct pattern for initiating service discovery:
private val scope = CoroutineScope(Dispatchers.Main + SupervisorJob())
private var discoveryTimeoutJob: Job? = null
private fun startServiceDiscovery(gatt: BluetoothGatt) {
discoveryTimeoutJob = scope.launch {
delay(SERVICE_DISCOVERY_TIMEOUT_MS)
Log.e(TAG, "Service discovery timed out")
handleDiscoveryFailure(gatt, reason = "timeout")
}
val started = gatt.discoverServices()
if (!started) {
discoveryTimeoutJob?.cancel()
Log.e(TAG, "discoverServices() returned false, retrying...")
scheduleRetry(gatt, attempt = 1)
}
}
companion object {
const val SERVICE_DISCOVERY_TIMEOUT_MS = 10_000L
}
The 10-second timeout is intentionally generous. On some devices, especially older Samsung phones running Android 10, service discovery can legitimately take 5 to 7 seconds when connecting to a peripheral with many characteristics.
The 600ms Delay: Why Immediate Discovery Fails
One of the most well-known (and poorly documented) issues in Android BLE development is that calling discoverServices() immediately after receiving onConnectionStateChange with STATE_CONNECTED will fail on a significant percentage of devices. The failure mode varies: on some phones it returns false, on others it triggers onServicesDiscovered with status 129 (GATT_INTERNAL_ERROR), and on others it simply never calls back.
The root cause is a race condition in the Android Bluetooth stack. When the connection event fires, the underlying L2CAP channel may not be fully established. The GATT client is technically connected but not yet ready to send ATT requests.
The widely adopted workaround is to introduce a delay of 600 milliseconds before calling discoverServices():
override fun onConnectionStateChange(
gatt: BluetoothGatt,
status: Int,
newState: Int
) {
if (newState == BluetoothProfile.STATE_CONNECTED) {
Log.d(TAG, "Connected, scheduling service discovery")
// Critical: delay discovery to let the stack stabilize
handler.postDelayed({
if (gatt.device.bondState == BluetoothDevice.BOND_BONDED) {
// Bonded devices may need a longer delay on Samsung
handler.postDelayed({
startServiceDiscovery(gatt)
}, 1600)
} else {
startServiceDiscovery(gatt)
}
}, 600)
}
}
The 600ms value comes from the Nordic Semiconductor team's research and is used in their nRF Connect SDK. For bonded devices, we use a longer 1600ms delay because Samsung's Bluetooth stack performs additional key exchange steps after the connection event fires, and starting discovery too early during this window will produce status 133 (GATT_ERROR).
Is 600ms Always Enough?
No. In our production analytics across Canadian device deployments, we see the following failure rates with different delays:
- 0ms delay: 12-18% failure rate, heavily skewed toward Samsung devices
- 200ms delay: 5-8% failure rate
- 600ms delay: 1-2% failure rate
- 1000ms delay: Under 0.5% failure rate
We typically use 600ms as the baseline and rely on retry logic to catch the remaining 1-2%. Increasing the delay to 1000ms adds noticeable latency to the connection flow, which matters for user experience in consumer-facing apps.
Samsung's GATT Cache Problem
Samsung devices maintain an aggressive GATT cache that persists across connections, app restarts, and sometimes even device reboots. When a peripheral updates its GATT database (which happens during firmware updates, or when a device has configurable services), Samsung phones will return the old, cached service list instead of performing a fresh discovery.
The symptoms are subtle and infuriating. Your app connects, discovery succeeds with GATT_SUCCESS, and you get a service list. But the characteristics you expect are missing, or they have the wrong properties (read-only instead of notify-enabled, for example). Everything works fine on Pixel phones. Only Samsung users report problems.
There is no public API to clear the GATT cache. However, there is a hidden method that works reliably across Samsung devices from Android 8 through Android 14:
private fun refreshGattCache(gatt: BluetoothGatt): Boolean {
return try {
val refreshMethod = gatt.javaClass.getMethod("refresh")
val result = refreshMethod.invoke(gatt) as Boolean
Log.d(TAG, "GATT cache refresh result: $result")
result
} catch (e: Exception) {
Log.w(TAG, "Failed to refresh GATT cache", e)
false
}
}
// Call this BEFORE discoverServices()
override fun onConnectionStateChange(
gatt: BluetoothGatt,
status: Int,
newState: Int
) {
if (newState == BluetoothProfile.STATE_CONNECTED) {
handler.postDelayed({
refreshGattCache(gatt)
handler.postDelayed({
startServiceDiscovery(gatt)
}, 300)
}, 600)
}
}
A few important notes about refresh():
- It is a hidden API. Google has not removed it since it was introduced in Android 4.3, and it is present in AOSP, but it is not part of the public SDK. Use it with the understanding that it could theoretically break in a future release.
- After calling
refresh(), you need an additional short delay (200-300ms) before callingdiscoverServices(). The cache clear is asynchronous internally. - Do not call
refresh()on every connection. Only call it when you have reason to believe the GATT database has changed, or when you detect missing/unexpected services after discovery.
Detecting a Stale Cache
We implement a simple validation step after discovery completes. If your peripheral exposes a known set of services, check for them immediately:
override fun onServicesDiscovered(gatt: BluetoothGatt, status: Int) {
discoveryTimeoutJob?.cancel()
if (status != BluetoothGatt.GATT_SUCCESS) {
handleDiscoveryFailure(gatt, reason = "status_$status")
return
}
val services = gatt.services
val hasExpectedService = services.any {
it.uuid == EXPECTED_SERVICE_UUID
}
if (!hasExpectedService && services.isNotEmpty()) {
// Likely a stale cache, refresh and retry
Log.w(TAG, "Expected service not found, refreshing cache")
refreshGattCache(gatt)
scope.launch {
delay(500)
gatt.discoverServices()
}
return
}
// Proceed with normal flow
onDiscoverySuccess(gatt, services)
}
GATT_SUCCESS Does Not Mean What You Think
Status code 0 (BluetoothGatt.GATT_SUCCESS) in the onServicesDiscovered callback means the ATT discovery procedure completed without a transport-level error. It does not guarantee that:
- All services are present in the result
- The service UUIDs match what the peripheral actually exposes
- Characteristic properties are accurate
- Descriptors were fully enumerated
We have seen cases, particularly on Xiaomi and Huawei devices, where GATT_SUCCESS is returned with an empty service list. This tends to happen when the Bluetooth stack is under load (multiple BLE connections active) or when the phone's Bluetooth service has been recently restarted.
The other status codes you will encounter in production:
- Status 5 (
GATT_INSUFFICIENT_AUTHENTICATION): The peripheral requires bonding before it will reveal its services. This is common with medical devices that protect their GATT database behind encryption. - Status 129 (
GATT_INTERNAL_ERROR): A generic stack error. Usually caused by calling discovery too early or while another operation is in progress. - Status 133 (
GATT_ERROR): The most common and least helpful status. Can mean almost anything. On Samsung, it often indicates a timing issue. On Pixel, it frequently signals that the peripheral disconnected during discovery. - Status 8 (
GATT_INSUFFICIENT_ENCRYPTION): Similar to status 5 but specifically indicates that the link is not encrypted. You may need to initiate bonding first.
Build your callback handler to be explicit about each status:
override fun onServicesDiscovered(gatt: BluetoothGatt, status: Int) {
discoveryTimeoutJob?.cancel()
when (status) {
BluetoothGatt.GATT_SUCCESS -> {
val services = gatt.services
if (services.isNullOrEmpty()) {
handleDiscoveryFailure(gatt, "empty_services")
} else {
validateAndProceed(gatt, services)
}
}
5, 8 -> {
// Authentication/encryption required
Log.w(TAG, "Discovery requires bonding, status: $status")
initiateBonding(gatt.device)
}
129 -> {
// Internal error, retry with delay
Log.e(TAG, "GATT_INTERNAL_ERROR during discovery")
scheduleRetry(gatt, attempt = 1, delayMs = 1000)
}
133 -> {
// Generic error, disconnect and reconnect
Log.e(TAG, "GATT_ERROR during discovery")
gatt.close()
scheduleReconnect(gatt.device)
}
else -> {
Log.e(TAG, "Unknown discovery status: $status")
handleDiscoveryFailure(gatt, "unknown_status_$status")
}
}
}
Retry with Exponential Backoff
A single retry attempt is not enough for production BLE apps. Network conditions change, the Bluetooth stack recovers from transient errors, and the peripheral might need a moment to stabilize after a firmware operation. But blind retries without backoff will hammer the Bluetooth stack and make things worse.
Here is the retry strategy we use across our neurotechnology BLE apps:
class ServiceDiscoveryManager(
private val scope: CoroutineScope,
private val maxRetries: Int = 3,
private val baseDelayMs: Long = 1000
) {
private var currentAttempt = 0
private var timeoutJob: Job? = null
fun startDiscovery(gatt: BluetoothGatt) {
currentAttempt = 0
attemptDiscovery(gatt)
}
private fun attemptDiscovery(gatt: BluetoothGatt) {
val timeout = SERVICE_DISCOVERY_TIMEOUT_MS +
(currentAttempt * 2000L)
timeoutJob = scope.launch {
delay(timeout)
onAttemptFailed(gatt, "timeout")
}
val started = gatt.discoverServices()
if (!started) {
timeoutJob?.cancel()
onAttemptFailed(gatt, "discoverServices_returned_false")
}
}
fun onAttemptFailed(gatt: BluetoothGatt, reason: String) {
timeoutJob?.cancel()
currentAttempt++
if (currentAttempt > maxRetries) {
Log.e(TAG, "Service discovery failed after $maxRetries retries")
listener?.onDiscoveryFailed(gatt, reason)
return
}
val delayMs = calculateBackoff(currentAttempt)
Log.w(TAG, "Discovery attempt $currentAttempt failed ($reason), " +
"retrying in ${delayMs}ms")
// Refresh cache on retry attempts 2+
if (currentAttempt >= 2) {
refreshGattCache(gatt)
}
scope.launch {
delay(delayMs)
attemptDiscovery(gatt)
}
}
private fun calculateBackoff(attempt: Int): Long {
val exponentialDelay = baseDelayMs * (1L shl (attempt - 1))
val jitter = (0..500).random().toLong()
return minOf(exponentialDelay + jitter, MAX_BACKOFF_MS)
}
companion object {
const val SERVICE_DISCOVERY_TIMEOUT_MS = 10_000L
const val MAX_BACKOFF_MS = 8_000L
}
}
Key design decisions in this implementation:
- Increasing timeout on retries. If the first attempt times out, the peripheral or stack may need more time. We add 2 seconds per retry attempt.
- Cache refresh on attempt 2+. If the first retry also fails, a stale cache might be the root cause. Clearing it costs nothing if the cache is already valid.
- Jitter on the backoff delay. If multiple BLE operations are being retried simultaneously (common in multi-device scenarios), jitter prevents them from colliding.
- Hard maximum of 3 retries. Beyond three attempts, the issue is unlikely to resolve itself. Escalate to a full disconnect/reconnect cycle or surface the error to the user.
Clearing Cache on Bond State Changes
When a device's bond state changes (bonding is created, removed, or re-established), the GATT cache frequently becomes invalid. This is particularly common during the following scenarios:
- The user manually removes the bond from Android's Bluetooth settings
- The peripheral's firmware is updated and it generates new encryption keys
- The user factory-resets the peripheral
- A bonding attempt fails and the bond state transitions from
BOND_BONDINGback toBOND_NONE
Register a BroadcastReceiver to listen for bond state changes and proactively clear the cache:
class BondStateReceiver(
private val onBondStateChanged: (BluetoothDevice, Int, Int) -> Unit
) : BroadcastReceiver() {
override fun onReceive(context: Context, intent: Intent) {
if (intent.action != BluetoothDevice.ACTION_BOND_STATE_CHANGED) return
val device = intent.getParcelableExtra<BluetoothDevice>(
BluetoothDevice.EXTRA_DEVICE
) ?: return
val previousState = intent.getIntExtra(
BluetoothDevice.EXTRA_PREVIOUS_BOND_STATE,
BluetoothDevice.BOND_NONE
)
val newState = intent.getIntExtra(
BluetoothDevice.EXTRA_BOND_STATE,
BluetoothDevice.BOND_NONE
)
Log.d(TAG, "Bond state changed: ${device.address} " +
"$previousState -> $newState")
onBondStateChanged(device, previousState, newState)
}
}
// In your connection manager:
private val bondReceiver = BondStateReceiver { device, prevState, newState ->
val gatt = activeConnections[device.address] ?: return@BondStateReceiver
when {
newState == BluetoothDevice.BOND_BONDED -> {
// New bond established, refresh and re-discover
refreshGattCache(gatt)
handler.postDelayed({
discoveryManager.startDiscovery(gatt)
}, 1600)
}
prevState == BluetoothDevice.BOND_BONDED &&
newState == BluetoothDevice.BOND_NONE -> {
// Bond removed, clear cache and disconnect
refreshGattCache(gatt)
gatt.disconnect()
}
}
}
Register this receiver with an IntentFilter for BluetoothDevice.ACTION_BOND_STATE_CHANGED. Make sure to register it before initiating any connections, and unregister it when your connection manager is destroyed.
OEM-Specific Discovery Quirks
Beyond Samsung's caching behavior, several other OEM-specific issues affect service discovery. Here is a summary of what we have observed in production across the most common devices in the Canadian market:
Samsung (One UI / Android 10-14)
- Aggressive GATT caching as described above
- The 1600ms bonded-device delay is almost mandatory
- Status 133 is disproportionately common and usually indicates a timing issue rather than a real protocol error
- Samsung's Bluetooth stack has a hard limit of approximately 30 cached GATT databases. On devices that connect to many BLE peripherals, old caches are evicted, which paradoxically fixes the stale-cache problem for those devices
Google Pixel (Android 12-15)
- Generally the most reliable for service discovery
- The 600ms delay is still needed but failures are rare
- Pixel phones are more likely to return accurate error codes, making debugging easier
- The Bluetooth stack can occasionally crash entirely during rapid connect/disconnect cycles, requiring the user to toggle Bluetooth off and on
Xiaomi / Redmi (MIUI)
- MIUI's battery optimization can kill the Bluetooth stack process during discovery, producing status 133 or no callback at all
- Service discovery may return empty service lists even with
GATT_SUCCESS - The hidden
refresh()method works but sometimes requires two calls on MIUI 14+
Huawei (EMUI / HarmonyOS)
- Similar to Xiaomi with aggressive battery optimization interference
- Some Huawei devices cap the number of concurrent BLE connections at 4, and attempting to discover services on a 5th connection will silently fail
- The GATT cache clearing method works reliably
Putting It All Together: A Production-Ready Discovery Flow
Here is the complete flow we recommend for service discovery in production BLE applications. This pattern handles the timing issues, caching problems, and OEM quirks described throughout this article:
class BleConnectionManager(
private val context: Context,
private val scope: CoroutineScope = CoroutineScope(Dispatchers.Main + SupervisorJob())
) {
private val discoveryManager = ServiceDiscoveryManager(scope)
private val activeGatts = mutableMapOf<String, BluetoothGatt>()
private val gattCallback = object : BluetoothGattCallback() {
override fun onConnectionStateChange(
gatt: BluetoothGatt, status: Int, newState: Int
) {
when (newState) {
BluetoothProfile.STATE_CONNECTED -> {
activeGatts[gatt.device.address] = gatt
val delay = if (gatt.device.bondState ==
BluetoothDevice.BOND_BONDED) 1600L else 600L
scope.launch {
delay(delay)
discoveryManager.startDiscovery(gatt)
}
}
BluetoothProfile.STATE_DISCONNECTED -> {
activeGatts.remove(gatt.device.address)
gatt.close()
}
}
}
override fun onServicesDiscovered(
gatt: BluetoothGatt, status: Int
) {
when (status) {
BluetoothGatt.GATT_SUCCESS -> {
val services = gatt.services
if (services.isNullOrEmpty()) {
discoveryManager.onAttemptFailed(
gatt, "empty_services"
)
} else {
discoveryManager.onDiscoverySuccess(gatt)
onReady(gatt, services)
}
}
5, 8 -> initiateBonding(gatt.device)
else -> discoveryManager.onAttemptFailed(
gatt, "status_$status"
)
}
}
}
private fun onReady(
gatt: BluetoothGatt,
services: List<BluetoothGattService>
) {
// Connection is fully ready for use
Log.i(TAG, "Discovery complete: ${services.size} services")
listener?.onDeviceReady(gatt)
}
}
This is not the simplest possible code. But simple code does not survive contact with the Android BLE ecosystem. Every delay, every retry, and every cache refresh in this pattern exists because a real device, in a real user's hands, failed without it.
Monitoring and Diagnostics
Finally, instrument your service discovery flow. In production, you need to know which devices are failing, what status codes they produce, and how many retry attempts are needed before success. We log the following metrics for every connection:
- Device manufacturer and model (
Build.MANUFACTURER,Build.MODEL) - Android version and SDK level
- Time from
STATE_CONNECTEDto successfulonServicesDiscovered - Number of retry attempts needed
- Whether a cache refresh was required
- The final status code (or "timeout" / "false_return")
- Bond state at the time of discovery
This data lets you make informed decisions about your delay values, retry counts, and whether you need OEM-specific workarounds. Without it, you are debugging BLE issues in the dark.
Service discovery is one of the most fragile steps in the Android BLE connection lifecycle, but with the right combination of timing delays, cache management, and retry strategies, it becomes predictable. Build defensively, instrument thoroughly, and test on as many physical devices as you can get your hands on.
Struggling with BLE service discovery issues on Android? Talk to DEVSFLOW Neuro. We build BLE-connected mobile apps for neurotechnology and medical device companies across Canada.