What happens the moment you press call on an iPhone?

Have you ever wondered about all of the tiny steps that happen when you press call on an iPhone? From steps on the iPhone itself, to the processes involved in getting fully connected to the cellular network to make a call, to the process of initiating, accepting, and staying connected as you talk to someone on the phone? If so, this ridiculously lengthy and hugely oversimplified post is for you. Without further rambling, let’s get into this!


On the iPhone, the Phone app is just a normal iOS application, powered by CoreTelephony and CallKit. As you tap and lift off the call button, so many things happen.

  1. UIKit dispatches a touch-up event to the app’s view controller. This is the user interface saying here’s where the user just tapped, you need to do something.
  2. The phone number that was typed in is validated for length and any character use (like # or *) before making a call, to ensure it can be routed, if you’re dialing a short code, or if you are dialing 911, 988, 112, or other emergency numbers that are routed differently and at higher priority.
  3. Assuming it’s a normal number, it’s handed off to CallKit, which manages the interfaces related to an active phone call, audio routing, and interactions with other applications running on the device.
  4. CallKit asks CoreTelephony (specifically the CommCenter daemon, which controls the cellular stack on iOS) to originate a circuit-switched IMS voice call.
  5. CommCenter performs policy checks: for example, whether the device is in airplane mode (and therefore has no access to the cellular network), whether there is a SIM card, whether the SIM card is locked, whether the line you are calling on is an eSIM or a physical SIM card, whether there is more than one SIM card and what the user preferences are, whether Wi-Fi calling is preferred or even available, among other things.
  6. The audio session is configured. iOS allocates the speech path through the audio hardware abstraction layer, sets the category to PlayAndRecord (record in this context means accessing the microphone; it is not recording the phone call itself, although that feature is available in newer versions of iOS), pauses other audio, and prepares the echo and noise suppression systems built into the device’s audio hardware.

And this is all cool, but the actual call is not even close to starting yet.

At this point, iOS needs to talk to the cellular modem. Apple’s model has the application processor talk to the cellular network built into the chips Apple purchases for the iPhone. These two processors are linked over a very fast on-device PCIe link. Here’s what happens:

  1. CommCenter serializes an originate-call request, which includes: dialed digits, SIM slot, presentation preferences (caller ID blocking), and the emergency call flag.
  2. This request is placed into the modem’s queue. The application processor signals the modem via an interrupt.
  3. The modem firmware (a real-time operating system, often based on Qualcomm’s AMSS or Apple’s equivalent for the C1 in newer iPhones) loads the command onto its own CPU and begins the radio-side work. From here, the application processor’s only job is to relay status updates and audio. Most of the code and operations making the phone call happen run on the modem’s software, not on the CPU that handles most of the rest of the work on the iPhone.

Your iPhone most likely was already connected to the cellular network before you typed in the phone number. Some exceptions exist, like being on Wi-Fi calling (which is outside the scope of this blog post). Either way, this is what allows the call dialing and connection process to be really fast. The longest part is actually waiting for the phone you called to accept your phone call.

In fact, the modem has already connected to a specific cellular tower and decoded the relevant broadcast information (MIB and SIBs, the Master Information Block and System Information Blocks, which are short messages the tower broadcasts on a schedule to tell phones how to talk to it) and is synchronized to its frame timing. The iPhone has already completed the LTE Attach or 5G registration process (the handshake where the phone identifies itself to the carrier’s core network, gets authenticated, and is assigned the IP-layer resources it needs to send and receive data). It’s authenticated with the SIM card and registered the IMS subscription with the carrier’s IMS core (the set of SIP-based servers, the CSCFs and HSS you’ll meet later, that handle voice calls and messaging as IP traffic rather than over the old circuit-switched telephone network).

Because these processes have already happened at some point, the cellular carrier already knows your IMS identity (IMPI and IMPUs, which are values derived from your IMSI, the unique number burned into your SIM card that identifies you to the cellular network; the IMPI is your private identity used for authentication, and IMPUs are your public identities, basically the phone numbers and SIP URIs other people can reach you at) and has a SIP registration binding (a record on the carrier’s IMS servers that says “this IMS identity is currently reachable at this IP address,” refreshed periodically so the carrier always knows where to send incoming calls) to your phone’s IP address on the IMS APN (a dedicated network path on the carrier reserved for IMS traffic, kept separate from your regular mobile data so voice signaling doesn’t have to compete with TikTok).

As soon as you tap call, the radio link is already up, the IP bearer is up, and you are already logged into the IMS network as a SIP user. These are all relatively fast processes, but they don’t need to happen every time you press dial. Staying connected is faster.

If you haven’t been using your phone before picking it up to make a phone call, it’s in RRC_IDLE (LTE) or RRC_INACTIVE/RRC_IDLE (5G), meaning that your device is saving power and is not holding radio resources. RRC, or Radio Resource Control, is the protocol that manages the connection between your phone and the cellular tower. It has three main states you should know about. RRC_IDLE is the deepest sleep: the phone has no active connection to a tower, it’s just listening for pages (the network’s way of saying “hey, someone wants to reach you”) and saving battery. RRC_INACTIVE is a 5G-specific middle state: the phone has released its active radio resources but the tower still remembers it, so reconnecting is faster than from full idle. RRC_CONNECTED is the active state where the phone holds dedicated radio resources and can actually send and receive data. To send things over the network (calls, text, data, etc.), your iPhone needs to transition to RRC_CONNECTED.

  1. The modem performs a random access procedure on the PRACH, selects a preamble, transmits it, and waits for a random access response from the base station on the PDCCH/PDSCH.
    • These acronyms are worth a brief detour. PRACH is the Physical Random Access Channel, a special slice of radio spectrum the tower reserves for phones that want to initiate contact (“random access” because the tower doesn’t know which phone will speak up next, so phones have to take their chances and try). The base station itself is called an eNodeB in LTE and a gNodeB in 5G; same role, different generation, both essentially mean “the radio equipment at the cell tower that handles your phone’s connection.” PDCCH and PDSCH are the two downlink channels the base station uses to reply: the PDCCH (Physical Downlink Control Channel) carries control information like “I heard you, here’s your temporary ID and a timing adjustment,” and the PDSCH (Physical Downlink Shared Channel) carries the larger data payload that goes with it.
  2. The base station replies with a timing advance value (which compensates for propagation delays), a temporary identifier, and an uplink grant (permission to transmit on a specific chunk of radio spectrum at a specific time; without a grant, the phone isn’t allowed to send anything on the uplink, since the tower has to coordinate transmissions from many phones at once to keep them from stepping on each other).
  3. The phone uses that grant to send an RRC Connection Request, identifying itself with the S-TMSI (the SAE Temporary Mobile Subscriber Identity, a short-lived identifier the core network hands out so the phone doesn’t have to broadcast its permanent IMSI over the air every time it connects; using a temporary identifier protects against passive surveillance and identity tracking) and a cause code of mo-Signalling or mo-VoiceCall.
  4. The base station sets up the RRC connection, allocates a Signaling Radio Bearer (a dedicated logical channel between the phone and the tower used specifically for control messages, kept separate from the bearers that will later carry voice or data so that signaling traffic doesn’t compete with user traffic for radio resources), and resumes context if the phone was in RRC_INACTIVE (or similar states).
  5. The phone sends an RRC Connection Setup Complete carrying an initial NAS message (NAS, or Non-Access Stratum, is the layer of signaling that runs between the phone and the core network rather than between the phone and the tower; the Service Request is the specific NAS message that tells the core network “this phone is back and wants to do something,” prompting the core to wake up the connection and start setting up resources) to the core network.

Congrats, you’ve wasted 100ms of your life connecting to the LTE network. What happens next?

Your device now needs to tell the core network that you want service. Your iPhone sends a Service Request, a Non-Access Stratum (NAS) message, traveling through the base station up to the MME (LTE) or AMF (5G). The MME (Mobility Management Entity) is the LTE core network node responsible for handling control-plane signaling from phones: things like authentication, tracking which tower the phone is currently camped on, and authorizing service requests. The AMF (Access and Mobility Management Function) is the equivalent node in 5G core networks, doing essentially the same job but in a more modular architecture where different functions (like session management or policy) are split out into separate nodes instead of all living in one box.

  1. The MME/AMF validates the Service Request, checking integrity protection using keys derived from the original attach. Integrity protection is a cryptographic check (technically a message authentication code, or MAC) that the phone attaches to its NAS messages to prove two things: that the message actually came from this specific phone, and that nobody tampered with it in transit. The keys used for integrity protection were established during the original attach, which is the earlier LTE Attach or 5G registration process where the phone authenticated with the network and the two sides agreed on a set of shared secrets. Reusing those keys here is what makes the Service Request fast: the phone doesn’t have to re-authenticate from scratch every time it wants to do something.
  2. The MME/AMF tells the base station to set up dedicated radio bearers for the IMS APN (the dedicated network path reserved for IMS voice and signaling traffic, kept separate from regular mobile data). A dedicated radio bearer is a logical channel between the phone and the tower with specific guaranteed properties (bandwidth, latency, priority), as opposed to a default bearer that gets best-effort treatment. The MME/AMF sets up two of them here: a bearer with QCI 5 for SIP signaling and, later, QCI 1 for voice media. QCI values, or QoS Class Identifiers, tell the cellular network the priority of different types of traffic. For example, an emergency call has higher network priority than someone streaming TikTok videos.
  3. The base station reconfigures the RRC connection with these new bearers and tells the phone the bearer identifiers and IP-layer details. The phone now has guaranteed-bitrate radio resources reserved for voice calls over the cellular network. Guaranteed bitrate (GBR) means exactly what it sounds like: the network has set aside a minimum amount of throughput for this bearer that it commits to deliver, even when the tower is congested. Most data traffic (web browsing, app updates, video streaming) runs over non-GBR bearers and gets whatever bandwidth is left over after the GBR traffic has been served. Voice calls get GBR treatment because dropping bits or stalling for half a second is fine for a Netflix stream (the buffer absorbs it) but catastrophic for a phone call (the conversation breaks).

Remember how the last process was mostly about getting the phone connected to the base station on the cell tower? This process was about getting connected to the core network itself, and getting radio resources reserved for the iPhone to make voice calls.

Now the fun (IMS) part begins. IMS, or IP Multimedia Subsystem, is the standardized Voice over IP architecture that cellular carriers use. It runs SIP (Session Initiation Protocol), the application-layer protocol that handles setting up, modifying, and tearing down real-time communication sessions like voice calls, video calls, and messaging, over the dedicated signaling bearer.

  1. The iPhone’s IMS stack composes an INVITE (the SIP message that gets sent over the network to set up the call). The INVITE carries an SDP body that describes the offered media: codecs, payload types, RTP/RTCP ports, DTMF method, and encryption parameters if SRTP is used.
    • A vocabulary detour, since this is a dense sentence:
      • SDP (Session Description Protocol) is a small text-based format for describing what kind of media session you want to have. It’s not a protocol that does anything itself; it’s just a structured description that SIP carries inside its messages. Think of SIP as the envelope and SDP as the letter inside saying “I’d like to talk; here’s what I can speak and where to send the audio.”
      • Codecs: the offered codecs are typically EVS first, then AMR-WB and AMR-NB as fallback, listed in descending order of quality. EVS (Enhanced Voice Services) is the modern codec used for HD Voice. AMR-WB (Adaptive Multi-Rate Wideband) is the older HD codec from the 3G/early-LTE era. AMR-NB (Adaptive Multi-Rate Narrowband) is the basic codec that sounds like a traditional phone call, used as the universal fallback when the other endpoints can’t handle anything better.
      • RTP/RTCP ports: the UDP port numbers where each phone will send and receive the actual audio (RTP) and the periodic quality statistics that go with it (RTCP).
      • DTMF method (RFC 4733): DTMF stands for Dual-Tone Multi-Frequency, the tones you hear when you press digits on a phone keypad. RFC 4733 specifies how to carry those tones inside RTP packets as discrete events rather than as actual audio tones, which is more reliable across modern networks and easier for systems like phone trees to decode.
      • SRTP (Secure RTP): the encrypted version of RTP, used when the call’s audio needs to be protected from eavesdropping. The encryption parameters in the SDP body are how the two phones agree on keys and algorithms before the media starts flowing.
  2. The INVITE is sent to the P-CSCF (Proxy Call Session Control Function), which is the first IMS node and was assigned to the phone during IMS registration. The P-CSCF sits at the edge of the carrier’s IMS network and acts as the phone’s single point of contact for all IMS signaling: every SIP message the phone sends goes to the P-CSCF first, and every SIP message destined for the phone comes through the P-CSCF last. It also handles security functions like IPsec for the signaling traffic and applies basic policy checks before forwarding messages deeper into the IMS network.
  3. The P-CSCF forwards it to the S-CSCF (Serving CSCF), the node inside the carrier’s IMS network that holds your subscriber profile and applies your originating service logic. Originating service logic is the set of rules that determine what happens to a call you’re placing, before it leaves your carrier’s network: call barring (you’ve blocked outgoing calls to certain numbers), supplementary services (call forwarding, three-way calling, etc.), anonymous call rejection, and similar features. The split between P-CSCF and S-CSCF exists so the edge of the network can stay relatively simple and stateless about subscribers, while the heavier per-subscriber logic lives deeper inside, where it can scale independently.
  4. The S-CSCF consults the HSS for your subscriber data if needed, then evaluates filter criteria. If the dialed number is on the same carrier, it routes the INVITE internally. If not, the INVITE is sent through a BGCF to an IBCF, which sends it to the other carrier’s network, sometimes via an SS7-to-SIP gateway if the destination runs on an older network such as a 2G/3G circuit-switched number. The exact technology depends on the carrier, device types, and network technology used.

At this point, the call is effectively registered with the cellular network (or networks). We’re getting closer to connecting the call itself.

  1. Also, “the number is looked up in the HSS” is a little compressed. The HSS lookup on the terminating side is doing something specific: it’s translating the dialed phone number into the recipient’s IMS identity and finding their current SIP registration, which tells the terminating S-CSCF which P-CSCF the recipient’s phone is currently using. That’s the chain of lookups that lets the call find a moving target (a phone that could be anywhere in the carrier’s network).
  2. The terminating S-CSCF and P-CSCF, the recipient-side counterparts of the originating IMS nodes the call passed through earlier, forward the INVITE to the recipient’s phone over its own IMS signaling bearer.
  3. The recipient’s phone’s modem hands the INVITE up to its CommCenter, which hands it up to CallKit, which wakes the iPhone’s Phone app and plays the ringtone, vibrates the Taptic Engine, displays caller ID, and shows the answer UI. The caller ID resolution is itself a small chain of lookups: the SIP INVITE’s From header carries the calling number, iOS checks that number against the local Contacts database to display a name if there’s a match, and falls back to displaying just the number if there isn’t. Some carriers also layer network-side spam labeling on top of this, and iOS adds features like Live Voicemail and call screening for unknown numbers. This whole step is essentially the mirror image of steps 1-6 on the originating side, running in reverse: the INVITE travels up the stack from modem to system framework to app, exactly the way the original call request traveled down.
  4. The recipient’s phone sends back a 180 Ringing SIP response. This travels along the same path to the calling phone. This takes under a second.
  5. Your phone, upon receiving 180 Ringing, plays back the ringback tone. By default this is generated locally on the phone (which is why you hear it instantly), but some carriers and configurations supply the ringback as media from the network instead, which is how features like custom ringback tones or carrier-branded audio work.

But we’re getting ahead of ourselves. IMS uses the precondition mechanism (RFC 3262 and RFC 3312) to ensure that both ends have the QoS resources they need before the call actually rings. The precondition mechanism is a SIP extension that lets the two sides of a call agree on a list of things that must be true before the call can proceed past a certain point. In IMS, the precondition is almost always “both ends have a dedicated voice bearer with guaranteed bitrate reserved and ready to carry media.” Without preconditions, you could end up in an awkward situation where the recipient answers the call but the audio path isn’t actually established yet, so the first second of conversation is lost. Preconditions exist to prevent that by gating the user-visible part of the call (the ringing, the answer) on the network-side part of the call (the bearer setup) being complete on both ends.

  1. The 183 Session Progress response carries an SDP answer with the chosen codec and the recipient’s RTP endpoint (the IP address and UDP port pair where the recipient’s phone is expecting to receive the actual audio packets once the call connects).
  2. Both phones signal their core network to activate the dedicated voice bearer (QCI 1, guaranteed bitrate, low latency). The PCRF authorizes the bearer, and the P-GW/UPF installs the matching packet filters.
    • A vocabulary detour for the three new acronyms:
      • PCRF (Policy and Charging Rules Function): the policy decision node in the LTE core network. It’s responsible for answering questions like “is this subscriber allowed to have a guaranteed-bitrate voice bearer right now?” based on their plan, the carrier’s policies, and current network conditions. In 5G, the equivalent function is called the PCF (Policy Control Function), which does the same job in a more modular architecture. The PCRF doesn’t carry any traffic; it just makes decisions and tells other nodes to enforce them.
      • P-GW (LTE) and UPF (5G): the gateway node where the carrier’s network meets the wider internet. In LTE it’s the Packet Data Network Gateway (P-GW); in 5G it’s the User Plane Function (UPF). This is the node that actually carries user traffic, including voice packets, and it’s where policies from the PCRF/PCF get enforced. Every voice packet, every web request, every video stream on cellular passes through this gateway at some point.
      • Packet filters: the rules the P-GW/UPF uses to identify which packets belong to which bearer. Since multiple bearers share the same physical network connection, the gateway needs a way to sort packets into the right pipes (voice goes on the QCI 1 bearer, web traffic goes on the default best-effort bearer, and so on). Packet filters typically match on things like source/destination IP, port numbers, and protocol, and they’re how the QoS treatment you negotiated actually gets applied to specific traffic flows.
  3. PRACK (Provisional Acknowledgment, RFC 3262) and UPDATE messages handle the precondition negotiation between the two phones. PRACK is the mechanism for reliably acknowledging the 183 Session Progress response from step 27, since standard SIP only requires acknowledgment for final responses (the 200 OK series), not for provisional ones. UPDATE messages then carry the actual precondition status: each side tells the other when its bearer is set up and ready, and once both sides confirm, the preconditions are satisfied and the call can proceed.
  4. With preconditions confirmed on both sides, the recipient’s phone finally alerts the user and sends a 180 Ringing response back to the calling phone. This is the moment everything in the IMS section has been building toward: the SIP signaling, the bearer setup, the precondition negotiation, all of it was setup for this. The recipient’s screen lights up, the ringtone plays, the Taptic Engine buzzes, and the calling phone gets the 180 Ringing it needs to start its ringback tone. The call is now in the user’s hands.

Exact implementations vary as to when the ringtone starts playing (sometimes it plays while preconditions are still being negotiated).

At this point, after about 30 tasks, the phone finally begins ringing!

  1. The recipient accepts the phone call. Their CallKit reports the answer to CommCenter, which tells the modem to send a 200 OK SIP response carrying the final SDP.
  1. Your phone receives the 200 OK, stops the local ringback tone, sends an ACK to complete the SIP three-way handshake, and switches CallKit into the active call state. The SIP three-way handshake is the INVITE / 200 OK / ACK exchange that establishes a SIP session: the caller sends INVITE to propose the call, the recipient sends 200 OK to accept it, and the caller sends ACK to confirm receipt. It’s structurally similar to the TCP three-way handshake (SYN / SYN-ACK / ACK), and for similar reasons: both ends need to confirm that the other end is ready before the session is considered established.

Voice calls can actually be encoded in a few ways, and the actual audio travels over RTP, not SIP. RTP (Real-time Transport Protocol) is the standard for carrying real-time media like voice and video over IP networks. Unlike SIP, which handles call setup and signaling, RTP carries the actual audio packets once the call is connected. The split exists because the two jobs have very different requirements: SIP needs to be reliable and orderly (which is why it can run over TCP or UDP), while RTP needs to be fast and low-latency, even if a few packets get lost along the way (which is why it runs over UDP and has its own mechanisms for dealing with loss).

  1. Each device’s microphone encodes its input to meet network bitrate limits and other requirements. The audio pipeline is: audio signal -> ADC -> audio DSP (echo cancellation against the speaker output) -> noise suppression -> automatic gain control -> speech encoder.
  1. The encoder produces a speech frame every 20 ms. Each frame is wrapped in an RTP header (sequence number, timestamp, SSRC, payload type) and sent as a UDP packet to the other phone’s RTP endpoint. SSRC stands for Synchronization Source, a randomly chosen 32-bit identifier that tags this particular media stream within the RTP session, so the receiving side can tell apart packets from different sources if multiple are present (in a one-on-one call there’s only one SSRC per direction, but the field exists because RTP was designed with conference calls and multi-party media in mind). The destination IP and port were exchanged earlier in the SDP.
  1. RTP packets traverse the audio bearer with QCI 1 priority. They go through the carrier’s P-GW/UPF, possibly cross to the other carrier’s network through interconnect, and arrive at the recipient. A media gateway may sit in the middle if transcoding is needed (for example, when two different encodings are used, or if EVS is not supported by the receiving network). Transcoding is the process of decoding audio from one codec and re-encoding it in another in real time. A media gateway is the network node that does this work: it sits in the media path (not the signaling path), receives RTP packets in one format, runs them through a decoder-encoder pair, and sends them out in the format the other side expects. Transcoding is generally avoided when possible because it adds latency and can degrade audio quality (each encode/decode cycle is lossy), but it’s sometimes the only way to bridge two networks with incompatible codec support.
  1. The receiving side has a jitter buffer (around 40 ms to 80 ms). The jitter buffer is a small queue that holds incoming audio packets briefly before sending them to the decoder, smoothing out variability in network arrival times so the audio plays back at a steady rate even when packets arrive irregularly. As delayed packets come in, they are reordered by sequence number, and frames are sent to the decoder at a steady rate. The codecs have packet loss concealment to reduce distortion and other issues during a call with any parties under poor network conditions. Packet loss concealment is the codec’s algorithm for generating a plausible replacement frame when an expected packet doesn’t arrive in time: the decoder essentially makes a short educated guess based on the surrounding audio, which sounds better than a sudden gap or click. The replacement isn’t perfect, but for occasional single-frame losses, listeners usually don’t notice.
  2. The decoder produces PCM samples (Pulse Code Modulation, the standard format for representing audio digitally: a stream of numbers, each one a measurement of the audio waveform’s amplitude at a specific moment in time, sampled tens of thousands of times per second), the audio DSP applies any final processing, the DAC converts the signal to analog (DAC stands for Digital-to-Analog Converter, the mirror of the ADC from step 33; it takes the stream of PCM numbers and turns them back into the continuous voltage signal that can drive a speaker or headphone), and the audio device in or connected to the iPhone plays it.
  3. RTCP packets are sent during the call every few seconds, carrying quality statistics: jitter (variability in packet arrival times), packet loss, and RTT (Round-Trip Time, the time it takes for a packet to travel from one end of the call to the other and for a response to come back). These stats give both phones and the network ongoing visibility into how the call is performing. If LTE coverage degrades, the network can trigger SRVCC (Single Radio Voice Call Continuity), the mechanism that transfers an in-progress VoLTE call to a 2G/3G circuit-switched bearer when the phone moves out of LTE coverage. SRVCC is one of the more impressive pieces of cellular engineering: the phone stays on a single radio at a time (which is what the “single radio” in the name refers to), but the network coordinates the handoff so the call doesn’t drop, switching the call from packet-switched VoLTE to old-school circuit-switched mid-conversation.

While all of this is very interesting, there’s more. Throughout a phone call, the cellular stack is doing loads of work in the background that you might otherwise never know about.

  • Handovers: When you are traveling, your phone automatically switches the cellular tower you’re connected to, including during active voice calls.
  • Active power management: During a phone call, the modem transmits with enough power to preserve signal quality while avoiding wasting battery.
  • Keepalive signaling: In the background, messages are sent to the base station periodically to keep the device connected to the network, including IMS service.

When it’s time to say goodbye, the ending caller sends a SIP BYE message and the other side responds with a SIP 200 OK. The dedicated voice bearer takes down the voice channel while keeping IMS active. CallKit returns the audio session to its previous state (e.g. silence, resumes your music, etc.). After a period of inactivity, the modem goes back to RRC_IDLE.

Your iPhone most likely was already connected to the cellular network before you typed in the phone number. Some exceptions exist, like being on Wi-Fi calling (which I’ll touch on briefly at the end of this post, but otherwise won’t cover in detail here). Either way, this is what allows the call dialing and connection process to be really fast. The longest part is actually waiting for the phone you called to accept your phone call.

Next time you tap dial, remember how many steps go into making your call happen and how connections that feel almost instant actually aren’t.

Leave a comment