.loc vs .iloc vs Chained Indexing
Think of a DataFrame like a two-dimensional grid. .loc uses the labels printed on the rows and columns to find cells, like using street names and house numbers. .iloc uses the invisible row and column counting numbers (0, 1, 2...) to find cells, like using coordinates on a map. Raw brackets primarily look for column labels by default.
The Setup
You are querying a dataset of orders indexed by unique integer transaction IDs that do not start at 0. You need to pull the record representing the first actual entry in the table, as well as the record with transaction ID 0 if it exists.
What Does This Print?
import pandas as pd
# Dataset with high-numbered, non-contiguous integer Index
data = {'amount': [150.25, 89.90, 412.50]}
indices = [1001, 1002, 1003]
df = pd.DataFrame(data, index=indices)
try:
# Attempting to fetch the first row
first_row = df[0]
print("Success! first_row using raw brackets:")
print(first_row)
except KeyError as e:
print(f"Failed using raw brackets: KeyError: {e}")
try:
# Using .loc to fetch by numeric position
loc_row = df.loc[0]
print("Success! loc_row:")
except KeyError as e:
print(f"Failed using .loc: KeyError: {e}")
The Output
The code outputs errors for both approaches:
Raw brackets df[0] try to search the columns index, failing because no column is named 0. .loc[0] looks for a row index with the label 0. Since your labels are 1001, 1002, 1003, no such label exists, causing a KeyError.
Why Python Does This
Under the hood, pandas exposes three indexers with completely distinct mechanics. .loc targets the index labels strictly via a hash-map lookup on the index object. .iloc bypasses the index entirely, acting as a low-level C-style indexer targeting relative offset positions of the underlying NumPy array. Standard bracket indexing df[val] is dynamically polymorphic and complex: it behaves as column lookup on DataFrames, but slice-based lookup on Series, which creates massive room for ambiguity when indexes are integers.
The Fix
import pandas as pd
data = {'amount': [150.25, 89.90, 412.50]}
indices = [1001, 1002, 1003]
df = pd.DataFrame(data, index=indices)
# Correct 1: To retrieve by relative position (the first row), always use .iloc
first_row_by_pos = df.iloc[0]
print("First row by position:")
print(first_row_by_pos)
# Correct 2: To retrieve by label index, always use .loc
first_row_by_label = df.loc[1001]
print("Row by index label:")
print(first_row_by_label)
.loc is designed for label-based indexing, correctly matching the explicit index values (like 1001) of your DataFrame. .iloc is designed for integer-position based indexing, always referring to the 0-based positional order of rows, regardless of their labels. Using them explicitly removes ambiguity and ensures reliable selection.
How This Fails in Real Systems
An inventory control script parsed warehouses bins using df[index]. After sorting, the integer index labels were scrambled relative to their physical offset. Because raw brackets were used, the code retrieved the incorrect item counts, causing the fulfillment center to ship incorrect item categories for three continuous days before a manual inventory audit caught the mismatch.
Key Takeaway
df[integer] to select rows by position, or misinterpreting whether .loc or .iloc is appropriate for integer-based row labels versus integer positions.