Troubleshooting TCP/IP Connectivity Issues: A Practical Guide for Admins

Last updated January 27, 2026 ~24 min read 18 views
TCP/IP network troubleshooting connectivity issues ping traceroute mtr ipconfig ifconfig netstat ss arp ndp dns dhcp routing mtu firewall packet capture wireshark tcpdump
Troubleshooting TCP/IP Connectivity Issues: A Practical Guide for Admins

TCP/IP issues are some of the most time-consuming incidents in operations because the symptoms are often vague (“can’t connect”, “slow”, “intermittent”), while the root cause can live anywhere from a bad cable to an application-level timeout. The fastest path to resolution is not guessing—it’s using a repeatable workflow that narrows the failure domain at each step.

This article lays out a practical TCP/IP troubleshooting approach for IT administrators and system engineers. It assumes you have console or remote access to endpoints and at least some visibility into network devices (switches, firewalls, routers), but it also shows how to make progress even when you only control the host. Along the way, you’ll see real incident-style scenarios and how the same process applies across them.

The key mindset is to treat “connectivity” as a chain: link state, IP configuration, neighbor resolution, routing, filtering, name resolution, and finally transport and application behavior. A break anywhere in that chain can look like the same user complaint.

Start by scoping the problem and defining “connectivity”

Before running commands, pin down what “doesn’t work” means in measurable terms. A TCP connection to a specific port failing is a different problem than DNS not resolving a name, and both are different from “internet is slow.” The more precisely you define the failing transaction, the fewer false leads you’ll chase.

Begin with four scoping questions:

First, identify the endpoints and directionality: which source host cannot reach which destination (IP and name), and is it one-way or both ways? Many incidents are asymmetric (return path issues, stateful firewall drops, policy-based routing), and you won’t see that if you only test from one side.

Second, determine the protocol and port. “Can’t access the web app” is usually TCP 443, while “can’t join the domain” can involve Kerberos (TCP/UDP 88), LDAP (389/636), SMB (445), and DNS (53). TCP/IP troubleshooting is much faster when you test the exact 5-tuple involved: source IP/port, destination IP/port, and protocol.

Third, decide whether the failure is absolute or intermittent. Intermittency suggests duplex mismatch, marginal wireless, congestion, ECMP hashing changes, unstable routing, or flaky NAT state. Absolute failures are more commonly addressing, routing, ACL, or service binding problems.

Finally, capture a baseline symptom with one or two reproducible tests. For example: “From host A, TCP connect to 10.20.30.40:443 times out; ICMP ping to 10.20.30.40 succeeds; DNS resolves correctly.” That one sentence already constrains the search to transport/port filtering/service issues rather than routing.

A useful pattern is to build a tiny matrix of tests that you can rerun after each change:

  • Reachability to destination IP (ICMP or equivalent).
  • TCP connect to the port.
  • Name resolution (forward and reverse if relevant).
  • Application-level request (HTTP status, TLS handshake, etc.).

With scope defined, you can move down the stack efficiently.

Even though this is a TCP/IP guide, many “IP problems” start with the interface not passing frames. Before you dig into routes and firewall rules, verify that the NIC is up, has link, and is using the expected medium and VLAN.

On Windows, start by checking the interface state and basic counters. PowerShell gives you a quick view:

Get-NetAdapter | Sort-Object Status, Name | Format-Table -Auto Name, Status, LinkSpeed, MacAddress
Get-NetAdapterStatistics -Name "Ethernet" | Format-List

Look for Status = Up, a sane link speed, and counters that make sense. A rapidly increasing ReceivedDiscarded or ReceivedErrors can indicate a physical problem, duplex mismatch, or driver issues. On virtual machines, link errors can reflect vSwitch misconfiguration or an overloaded host.

On Linux, check link state and error counters:

bash
ip -br link
etstat -i
ip -s link show dev eth0
etstat -s | sed -n '1,120p'

If the interface is down, you can bring it up, but treat that as a symptom—why did it go down? Also verify you’re on the expected network. On a trunked environment (hypervisor port groups, container bridges, VLAN tagging), it’s easy to end up on the wrong VLAN, which will later look like “routing” or “firewall” problems.

At this stage, also confirm you aren’t dealing with a simple but common issue: a second interface, VPN client, or virtual adapter stealing default route preference. Hosts with multiple active interfaces can send traffic out the wrong path even when each interface is healthy.

Confirm IP addressing, netmask, gateway, and duplicate IP conditions

Once you know frames can flow, validate the host’s IP configuration. Incorrect IP, subnet mask, or default gateway values can still allow some local traffic and break everything else in ways that are confusing.

On Windows:

powershell
Get-NetIPConfiguration | Format-List
ipconfig /all

Pay attention to:

  • The IPv4/IPv6 address and prefix length.
  • Default gateway presence and whether it’s on-link.
  • DNS servers and whether they’re reachable.
  • DHCP status and lease validity.

On Linux:

bash
ip -br addr
ip route
cat /etc/resolv.conf

A frequent failure mode in enterprise networks is a stale static configuration on a machine that was moved to a new subnet. The host might still reach a few resources (anything on the same L2 segment) but fails for everything else due to a wrong gateway.

Another common issue is duplicate IP addressing. Duplicate IPs can appear as intermittent connectivity, ARP flapping, or “sometimes I reach the wrong server.” On Windows you may see warnings in the event log; on Linux you’ll see unstable neighbor entries.

To detect duplicates pragmatically, you can query for the MAC address associated with a given IP from another host on the same subnet:

bash
arp -an | grep "10.20.30.50"

On Windows:

powershell
arp -a | findstr 10.20.30.50

If the MAC changes repeatedly without a good reason (like VRRP/HSRP failover), suspect a duplicate or an ARP spoofing scenario. In managed networks, your switch’s MAC address table and ARP inspection logs are authoritative.

Scenario 1: “VPN users can’t reach the file server, but ping works”

A common incident pattern: remote users connect to VPN and can ping an internal file server’s IP, but SMB access (TCP 445) fails. Start with the host configuration on the VPN client: the presence of multiple adapters and routes.

In one real-world case, the VPN client pushed a route for the file server subnet but did not push DNS servers, so users resolved the file server name to a public IP via their home DNS. Ping succeeded because they tested the internal IP, but \\fileserver\share failed because the name resolved elsewhere.

That kind of mismatch becomes obvious only after you confirm both IP reachability and name resolution separately. The workflow in later sections will show how to systematically separate those concerns.

Test local stack behavior with loopback and self-reachability

Before assuming the network is broken, confirm the local TCP/IP stack is functioning. Loopback tests are quick and remove the physical network from the equation.

On any system, ping 127.0.0.1 (IPv4) and ping ::1 (IPv6) validate that ICMP is working through the local stack. It doesn’t prove TCP is healthy, but it’s a quick sanity check.

For TCP services hosted on the same machine, test listening sockets and local connects:

On Windows:

powershell
Get-NetTCPConnection -State Listen | Sort-Object LocalPort | Select-Object -First 20

On Linux:

bash
ss -lntp | head -n 30

If an application should be listening on 0.0.0.0:443 (all interfaces) but is only bound to 127.0.0.1:443, remote clients will time out even though the service appears “up” locally. This is a classic “connectivity” incident that is actually application binding.

Verify neighbor discovery: ARP (IPv4) and NDP (IPv6)

If two hosts are in the same L2 segment, they must be able to resolve each other’s MAC addresses (neighbor resolution) before any IP traffic flows. For IPv4 this is ARP; for IPv6 it’s Neighbor Discovery Protocol (NDP). Failures here can masquerade as routing problems.

From the source host, try reaching a known on-link IP (like the default gateway). Then inspect the neighbor table.

Windows:

powershell
ping -n 1 10.20.30.1
arp -a

Linux:

bash
ping -c 1 10.20.30.1
ip neigh show

A healthy entry will be in a reachable/stale state with a MAC address. If it’s incomplete/failed, either the target is down, you’re in the wrong VLAN, or something is filtering ARP/NDP (less common but possible with strict security policies).

If the default gateway’s MAC cannot be resolved, stop and fix L2/VLAN issues before moving on. No amount of route tweaking on the host will help if it can’t even discover its first hop.

Establish the path: routing tables, policy routing, and asymmetric return

Once local addressing and neighbor resolution look sane, routing becomes the next gate. TCP/IP troubleshooting often fails when people assume “the network routes it.” You want to prove what the host thinks the path is, and then validate that the network will return traffic.

Start with the host routing table.

Windows:

powershell
Get-NetRoute | Sort-Object -Property DestinationPrefix, RouteMetric | Select-Object -First 40
route print

Linux:

bash
ip route show
ip rule show

On Linux, ip rule is critical in environments using policy-based routing (PBR) for multi-homed servers, VRFs, or source-based routing. A route can exist but never be used because a higher-priority policy rule sends the traffic to a different table.

Next, verify that the destination is either on-link (same subnet) or that a default/specific route exists pointing to the correct gateway. If the destination is in a remote subnet and there is no matching route, you will see “Network unreachable” errors on Linux or immediate failures on Windows depending on the API.

Then consider asymmetry. Asymmetric routing means the forward path differs from the return path. Many stateful firewalls and NAT devices require symmetry to match sessions; if return traffic goes around the device, sessions break and you’ll see timeouts.

A quick way to detect asymmetry is to test from both ends and compare traceroutes, or capture packets on the source to see if SYNs leave but SYN-ACKs never return.

Use ICMP and TTL-based tools correctly (ping, traceroute, mtr)

ICMP is not “the internet,” but it’s still a high-signal diagnostic tool. Use it with intent and interpret results with awareness of modern filtering.

Ping for basic reachability and loss patterns

Ping tests whether you can receive an ICMP Echo Reply from a target. If it fails, the cause can be: the host is down, routing is broken, or ICMP is filtered. That ambiguity is why you should pair ping with other tests.

Use ping in two modes:

  • Short sanity checks (one or a few packets) to validate that a change improved reachability.
  • Longer runs to detect intermittent loss or jitter.

Windows:

powershell
ping -n 20 10.20.30.40

Linux:

bash
ping -c 20 10.20.30.40

If you see periodic loss (every N packets), think of rate limiting, congestion, or an overloaded device. If loss starts after a certain hop (from traceroute/mtr), you can localize it further.

Traceroute and MTR to localize the break

Traceroute (or tracert on Windows) uses increasing TTL (hop limit) to elicit ICMP Time Exceeded messages. It maps the path but can be misleading if devices deprioritize or filter ICMP responses.

Windows:

powershell
tracert -d 10.20.30.40

Linux:

bash
traceroute -n 10.20.30.40

If you can install it, mtr combines traceroute and ping to show loss per hop over time:

bash
mtr -rn 10.20.30.40

Interpretation matters: loss shown at an intermediate hop can be harmless if later hops show no loss; that often indicates ICMP rate limiting on that router, not actual forwarding loss. The real signal is loss that persists to the destination.

Separate name resolution from IP reachability (DNS as a dependency)

DNS problems commonly masquerade as “TCP/IP connectivity” issues because most user actions begin with a name, not an IP. Even when the network path is fine, the wrong answer from DNS (or no answer) breaks the workflow.

Start by testing the name you actually use in the application, and record the returned IPs. Compare that to what you expect.

Windows:

powershell
Resolve-DnsName app01.corp.example.com
nslookup app01.corp.example.com

Linux:

bash
dig +short app01.corp.example.com
getent hosts app01.corp.example.com

If the name resolves to multiple IPs, confirm that all are valid and reachable from the client network. Load-balanced records can cause “intermittent connectivity” when one backend is down or blocked.

Also validate that the client is querying the intended DNS servers. Split-horizon DNS (different answers internally vs externally) is a major source of confusion for VPN, cloud, and hybrid networks.

To test which server answers and what it returns:

bash
dig @10.10.10.10 app01.corp.example.com +noall +answer

On Windows you can set the server in nslookup interactively or use Resolve-DnsName -Server.

Scenario 2: “Only one subnet can’t reach the app by name”

Consider an internal web app reachable by IP from everywhere, but users in a new subnet report failures when using the hostname. Ping to the hostname fails, and the browser spins.

Following the workflow, you would first test IP reachability to the known app VIP; it works. Then you resolve the hostname from a failing client and see it returns an old IP that was retired months ago. The new subnet uses a different DNS forwarder (or conditional forwarder) than the rest of the network, and its zone data was never updated.

This kind of incident is resolved faster when you explicitly treat DNS as a dependency with its own health checks, rather than mixing it into “network connectivity.”

Confirm the destination service is reachable at the transport layer (TCP/UDP)

ICMP reachability is not proof that TCP or UDP works. Firewalls often allow ping but block ports, and services can be down even when the host is up.

For TCP, your goal is to verify whether the three-way handshake can complete. For UDP, you usually need application-specific checks because UDP is connectionless.

TCP port checks from Windows and Linux

Windows PowerShell includes Test-NetConnection:

powershell
Test-NetConnection -ComputerName 10.20.30.40 -Port 443
Test-NetConnection -ComputerName app01.corp.example.com -Port 443

Pay attention to TcpTestSucceeded and whether name resolution returned the expected address.

On Linux, nc (netcat) is commonly available:

bash
nc -vz 10.20.30.40 443

If nc isn’t available, curl can validate TCP and TLS/HTTP in one step:

bash
curl -vk https://10.20.30.40/

A TCP timeout suggests filtering or routing/return-path issues. A connection refused indicates the host was reached but nothing is listening on that port (or a firewall is actively rejecting). A TLS failure indicates the service responded but the application layer is misconfigured (certificates, SNI, protocol versions).

UDP checks without guessing

For UDP services like DNS, NTP, SNMP, or syslog, prefer protocol-aware tools:

  • DNS: dig or Resolve-DnsName
  • NTP: w32tm /stripchart on Windows, chronyc sources on Linux (depending on setup)

Blind UDP “port checks” are often misleading because lack of response might be normal behavior.

Inspect host firewall and endpoint security controls

At this point in the workflow, you have enough evidence to decide whether traffic should be reaching the destination and whether a port is blocked. When TCP connect fails but routing and ARP look healthy, host-based filtering is a common culprit.

On Windows, check Windows Defender Firewall profiles and effective rules. Focus on inbound rules on the destination server and outbound rules on the client if your environment uses restrictive egress.

powershell
Get-NetFirewallProfile | Format-Table Name, Enabled, DefaultInboundAction, DefaultOutboundAction
Get-NetFirewallRule -Enabled True -Direction Inbound | Select-Object -First 30 DisplayName, Action, Profile

On Linux, the tooling varies. On modern distributions you may see nftables, iptables, or a higher-level manager like firewalld or ufw. The important point is to identify whether the port is allowed and on which interface.

Examples:

bash
sudo nft list ruleset | sed -n '1,200p'

bash
sudo iptables -S
sudo iptables -L -n -v | sed -n '1,200p'

Also consider endpoint security products that add network filtering (EDR agents, host IPS). These can drop or proxy traffic in ways that don’t appear in OS firewall rules.

When a port appears open locally but is unreachable remotely, compare local listening state (ss -lntp or Get-NetTCPConnection -State Listen) with the firewall rule scope. Rules might be limited to specific remote IP ranges or specific profiles (Domain vs Public).

Understand NAT and stateful device behavior (where TCP sessions “disappear”)

Network Address Translation (NAT) and stateful firewalls are frequent points of failure in TCP/IP troubleshooting because they rely on session state. A route can exist and a port can be permitted, but if session state isn’t created or maintained correctly, packets won’t make it back.

Common NAT/stateful failure modes include:

  • Asymmetric routing: return traffic bypasses the stateful device.
  • Overlapping RFC1918 space between sites: NAT rules match unexpectedly.
  • Session timeouts too low: long-lived idle TCP sessions break.
  • Port exhaustion on PAT (many-to-one NAT): new outbound connections fail.

From a host, you can’t “see” NAT tables directly, but you can infer state issues with packet capture (SYNs leave, no SYN-ACK returns), with tests from multiple source IPs, and by correlating firewall logs.

If you control a firewall, check whether the session is created and whether return traffic is hitting the same device. On platforms like Palo Alto, FortiGate, and Cisco ASA/FTD, session table inspection is often the fastest way to determine whether you’re dealing with a policy drop, NAT mismatch, or routing asymmetry. The exact commands vary by vendor and software version, so rely on your platform documentation, but the concept is constant: verify a session exists for the 5-tuple and that NAT and policy match expectations.

Scenario 3: “Intermittent API timeouts after a firewall change”

A realistic example: an internal service calls a third-party API over TCP 443. After a firewall maintenance window, the service sees intermittent timeouts, especially during bursts.

Following the workflow, link and IP config are fine, DNS resolves correctly, and curl sometimes works. Packet capture on the service host shows SYNs leaving, but during failures no SYN-ACK returns. On the firewall, you observe the source NAT pool nearing exhaustion during bursts, causing new outbound sessions to fail until older entries age out.

The fix is not “restart the service”; it’s increasing NAT pool capacity, adjusting timeouts appropriately, or spreading egress across more public IPs. This is a textbook case where TCP/IP symptoms (timeouts) are caused by state resource limits.

Use packet capture to turn guesses into facts

When command-line tests plateau—ping works, routes look fine, but TCP connects time out—packet capture is the most direct way to see where the conversation stops. You don’t need to capture everywhere; a capture at the source host (and sometimes at the destination) is usually enough to decide whether the network is dropping outbound packets, inbound replies, or whether the destination is not responding.

Capturing on Linux with tcpdump

Capture only what you need: narrow by host and port to reduce noise.

bash
sudo tcpdump -i eth0 -nn host 10.20.30.40 and tcp port 443 -vv

If the issue is intermittent, write to a file for later analysis:

bash
sudo tcpdump -i eth0 -nn host 10.20.30.40 and tcp port 443 -w /tmp/app01-443.pcap

Key patterns to look for:

  • SYN leaves, no SYN-ACK returns: path/filtering/return-path problem.
  • SYN leaves, SYN-ACK returns, ACK never sent: local host stack/firewall issue.
  • Handshake completes, then retransmits/zero window: application performance or MTU/fragmentation problems.

Capturing on Windows

Windows has built-in packet capture capabilities via pktmon (and ETW-based tooling), and many admins use Wireshark/Npcap where allowed. If you can use pktmon, you can capture and convert to pcapng for analysis.

A minimal pktmon flow is:

powershell
pktmon filter remove
pktmon filter add -i 10.20.30.40
pktmon start --etw -p 0

# reproduce the issue

pktmon stop
pktmon format PktMon.etl -o capture.pcapng

Exact flags and outputs can vary by Windows build; validate on your target OS version. The goal is the same: confirm whether packets leave and whether replies return.

Packet capture also helps validate MTU issues, because you can see ICMP “Fragmentation Needed” messages (IPv4) or ICMPv6 “Packet Too Big” messages when Path MTU Discovery is functioning.

Address MTU and fragmentation issues (the “ping works but app fails” classic)

MTU (Maximum Transmission Unit) is the largest frame payload size that can be carried on a link. Ethernet commonly uses 1500 bytes; VPNs, tunnels, and overlays often reduce effective MTU due to encapsulation. When MTU is mismatched and Path MTU Discovery is blocked, you get failures that are notoriously confusing: small packets (including ping) work, while larger packets (TLS handshakes, file transfers) stall.

This presents as:

  • TCP connections that establish but hang on data transfer.
  • HTTPS sites that partially load or time out during TLS handshake.
  • SMB copies that start then freeze.

To test MTU issues, use ping with the “do not fragment” option (IPv4). On Windows:

powershell
ping -f -l 1472 10.20.30.40

1472 bytes of payload + 28 bytes IP/ICMP header ≈ 1500 MTU. Reduce the payload until it succeeds; that gives you an estimate of the path MTU.

On Linux:

bash
ping -M do -s 1472 10.20.30.40

If you find that packets larger than, say, 1360 bytes fail across a VPN, adjust the tunnel MTU/MSS clamping on the VPN device, or configure the host interface MTU appropriately. Avoid “randomly lowering MTU everywhere”; target the segment or tunnel causing the constraint.

Evaluate TCP behavior: retransmissions, windowing, and resets

Even when routing and filtering are correct, TCP performance issues can look like “connectivity problems.” Understanding a few TCP behaviors helps you distinguish between a hard block and a degraded path.

If you see repeated retransmissions in a capture, that indicates packet loss or severe delay. Loss can be due to congestion, bad physical links, queue drops, or policing.

If you see RST (reset) packets, that typically means one side is rejecting the connection. Common causes are:

  • No process is listening on the destination port.
  • A firewall or load balancer is configured to reject rather than drop.
  • An application is actively closing because it doesn’t like the request (less common at pure TCP level).

Windowing problems include “zero window” where a receiver advertises it can’t accept more data, causing the sender to pause. That can be a sign of application backpressure or resource saturation on the receiving host rather than a network defect.

On Linux, you can inspect TCP socket states to see whether connections are stuck in SYN-SENT, SYN-RECV, or ESTABLISHED with unusual patterns:

bash
ss -ant state syn-sent
ss -ant state established | head

On Windows, Get-NetTCPConnection provides similar visibility:

powershell
Get-NetTCPConnection | Group-Object State | Sort-Object Count -Descending

If many connections are stuck in SYN-SENT, suspect filtering/return path issues. If many are TIME-WAIT or ephemeral port exhaustion is suspected on a busy client, examine the client’s ephemeral port usage and connection churn (especially for NAT gateways or proxy services).

Don’t ignore IPv6: dual-stack pitfalls and preference rules

Modern enterprise networks are often dual-stack (IPv4 and IPv6). Connectivity complaints can arise when IPv6 is partially deployed: clients prefer IPv6, but IPv6 routing, firewall rules, or DNS records are incomplete.

The symptom is often: “It works on some machines but not others,” or “It works when I use the IPv4 address.” In a dual-stack environment, a hostname may resolve to both A (IPv4) and AAAA (IPv6) records, and the client will choose based on OS policy and reachability.

To identify this quickly:

  • Resolve the name and see if AAAA records exist.
  • Test connectivity explicitly over IPv6 and IPv4.

Linux:

bash
dig +short A app01.corp.example.com
dig +short AAAA app01.corp.example.com
ping -6 -c 3 app01.corp.example.com
curl -6 -vk https://app01.corp.example.com/

Windows:

powershell
Resolve-DnsName app01.corp.example.com -Type A
Resolve-DnsName app01.corp.example.com -Type AAAA
Test-NetConnection -ComputerName app01.corp.example.com -Port 443 -InformationLevel Detailed

If IPv6 fails but IPv4 works, you can mitigate by fixing IPv6 routing/firewall/DNS rather than disabling IPv6 on clients (which often causes other problems in Windows environments). The goal is consistency: either support IPv6 properly or ensure clients don’t prefer broken IPv6 paths.

Work from the destination backward when server-side visibility is available

So far, the workflow has been source-centric. When you control the destination server, you can significantly accelerate TCP/IP troubleshooting by validating what the server sees.

First, confirm the service is listening on the expected interface and port. A service bound to a specific IP that changed (due to DHCP, migration, or IP failover) will appear “down” to clients.

On Linux:

bash
sudo ss -lntp | grep ':443'

On Windows:

powershell
Get-NetTCPConnection -LocalPort 443 -State Listen

Next, check whether connection attempts arrive at the server. Packet capture is again definitive:

bash
sudo tcpdump -i eth0 -nn tcp port 443 -c 50

If SYNs arrive but the server never responds with SYN-ACK, that’s a local stack or firewall issue. If SYNs do not arrive, the issue is upstream (routing, firewall, NAT, load balancer).

If SYNs arrive and the server responds, but the client never completes the handshake, suspect return path filtering, asymmetric routing, or a client-side firewall. This “server sees SYN, replies, client never ACKs” pattern is common when a firewall in the return path drops SYN-ACK due to policy.

Factor in load balancers and VIPs (where the target isn’t the real server)

Many enterprise applications sit behind a load balancer VIP (virtual IP). TCP/IP troubleshooting changes slightly because “the destination” is a device that may proxy, NAT, or distribute connections.

If clients can’t reach a VIP, test the VIP directly and then test a backend server directly from the same client network (if permitted). If the VIP fails but the backend works, you’re looking at a load balancer listener, health monitor, pool membership, or security policy issue rather than basic routing.

Also consider source IP preservation. Some load balancers SNAT client connections; others preserve source IP and require backend routing to return through the load balancer. If a backend returns directly to the client (bypassing the load balancer), the TCP session breaks. This is a form of asymmetry, but the “stateful device” is the load balancer.

Packet captures on the backend can reveal this quickly: you see SYNs from the client IP, the backend replies, but the reply path doesn’t go back via the load balancer as expected.

Integrate cloud and hybrid specifics (security groups, NSGs, and routing)

In cloud environments, TCP/IP troubleshooting often involves multiple layers of policy: guest OS firewall, cloud security groups, subnet route tables, and sometimes centralized inspection. The workflow remains the same, but you must check each layer systematically.

In Azure, for example, a TCP timeout to a VM can be caused by:

  • NSG rule blocking inbound/outbound.
  • User-defined routes sending traffic to an NVA (network virtual appliance) that drops it.
  • VM OS firewall blocking.
  • Load balancer health probe misconfiguration.

While the article is not a cloud platform manual, it’s worth explicitly adding cloud policy checks to your chain of dependencies: even if the guest OS is perfect, the platform can still block packets.

If you use Azure CLI and have permissions, you can inspect effective NSG rules and route tables; the exact commands depend on your environment and are well documented by Microsoft. The operational point is to validate “effective” policy (what applies to the NIC/subnet) rather than only intended policy.

Build a repeatable workflow: a layered decision tree you can run under pressure

By now, you’ve seen the main dependency chain. The quickest way to apply it during incidents is to run the same high-signal checks in the same order, recording outcomes so you don’t loop.

Start local, then expand outward:

First, validate the interface and IP configuration (link up, correct address, correct gateway, DNS servers). If those are wrong, fix them before touching anything upstream.

Second, validate neighbor resolution to the default gateway (ARP/NDP). This confirms you’re on the right L2 and can reach the first hop.

Third, validate routing: does the host have a route to the destination, and is the correct interface/gateway used? If policy routing is involved, confirm which table applies.

Fourth, validate reachability to the destination IP and then to the destination port. Keep ICMP and TCP tests separate in your notes.

Fifth, validate DNS resolution for the real hostname used by the application, and confirm the returned IP is the one you tested.

Sixth, check local and remote firewalls based on what failed: if SYNs leave but don’t return, look upstream; if SYNs arrive but don’t get answered, look at the destination host; if handshake completes but app fails, look higher (TLS, HTTP, auth).

Finally, when the evidence is ambiguous, capture packets at the source and destination. Packet capture is the fastest way to stop speculation.

This workflow is also how you communicate effectively during incidents: you can tell stakeholders exactly which layer failed and what evidence supports it.

Practical command sequences you can copy/paste (Windows and Linux)

When you’re paged at 2 a.m., having a short “known good” command sequence helps. The goal is not to run every command—it’s to quickly gather the minimum data needed to choose the next step.

Windows quick sequence

Use this sequence from the client first, then from the server if you have access.

powershell

# 1) Interface and IP configuration

Get-NetAdapter | Format-Table -Auto Name, Status, LinkSpeed
Get-NetIPConfiguration | Format-List

# 2) Routing

route print

# 3) DNS resolution

Resolve-DnsName app01.corp.example.com

# 4) Reachability and port test

ping -n 4 10.20.30.40
Test-NetConnection -ComputerName 10.20.30.40 -Port 443

# 5) Listening sockets (on server)

Get-NetTCPConnection -State Listen | Select-Object -First 30 LocalAddress, LocalPort, OwningProcess

Linux quick sequence

bash

# 1) Interface and addressing

ip -br link
ip -br addr

# 2) Routing and policy routing

ip route
ip rule

# 3) DNS

getent hosts app01.corp.example.com

# 4) Reachability and port test

ping -c 4 10.20.30.40
nc -vz 10.20.30.40 443 || true
curl -vk https://10.20.30.40/ --max-time 10 || true

# 5) Listening sockets (on server)

ss -lntp | head -n 40

These sequences align with the layered approach described earlier: establish local health, determine the path, resolve names, then test the actual service.

Tie symptoms to likely layers (without jumping to conclusions)

The value of a TCP/IP troubleshooting workflow is speed: you avoid rabbit holes by mapping symptoms to layers and then validating.

When you see “Network unreachable” or an immediate failure, suspect routing or local configuration. When you see timeouts, suspect filtering, path issues, or stateful device behavior. When you see “connection refused,” suspect service not listening or an active reject. When you see intermittent slowdowns and retransmits, suspect loss, MTU, or congestion.

However, don’t stop at the first plausible explanation. For example, “ping fails” could be ICMP blocked. That’s why pairing tests matters: if ping fails but TCP 443 succeeds, you don’t have a connectivity issue—you have an ICMP policy.

Similarly, if DNS resolves but the returned IP is wrong for that client network, you don’t have a routing problem—you have a name resolution or split-horizon design problem.

Operational habits that reduce repeat incidents

TCP/IP incidents are easier to resolve when you can compare current behavior to known-good baselines. Over time, a few operational habits pay off.

Maintain accurate IPAM (IP address management) and document subnet gateways, DHCP scopes, and reserved ranges. Many duplicate IP incidents occur because teams lose track of what is static vs dynamic.

Standardize logging and visibility: enable firewall session logging for denies (with appropriate rate limiting), keep load balancer health history, and centralize syslogs. When you can correlate “client attempted connection at time T” with “firewall dropped due to rule X,” mean time to resolution collapses.

Finally, keep a small set of “golden path” tests for critical services (DNS, auth, key apps) from each network zone. Synthetic checks that exercise real TCP flows catch issues before users do, and they provide immediate evidence about where the chain broke.