AST Manipulation — Reading & Modifying Python's Syntax Tree
Imagine Python code as a sentence, and the AST as its grammatical parse tree. Each word, phrase, and clause has a specific role (node type) and relationship to others. Manipulating the AST is like dissecting and rebuilding this tree: you need to know the exact parts and how they fit together to correctly alter the sentence's structure or meaning.
The Setup
A development team needs to enforce a custom coding standard: all string literals passed to a specific logging function must be f-strings for better context. Manually checking this in code reviews is proving tedious and error-prone across a large codebase.
What Does This Print?
import ast
code_snippet = """
import logging
def process_data(data):
logging.info("Processing data...")
user_id = data.get('user_id')
logging.debug('User ID: ' + str(user_id))
item_count = len(data.get('items', []))
logging.warning(f"Unexpected item count: {item_count}")
return data
process_data({'user_id': 123, 'items': [1,2,3]})
"""
# Parse the code into an AST
tree = ast.parse(code_snippet)
# A naive visitor to find logging calls (but not checking f-string specifically)
class LoggingCallVisitor(ast.NodeVisitor):
def visit_Call(self, node):
if isinstance(node.func, ast.Attribute) and \
isinstance(node.func.value, ast.Name) and \
node.func.value.id == 'logging' and \
node.func.attr in ['info', 'debug', 'warning', 'error', 'critical']:
print(f"Found logging call at line {node.lineno}: {ast.unparse(node)}")
self.generic_visit(node)
visitor = LoggingCallVisitor()
visitor.visit(tree)
# This 'broken' code only finds calls, it doesn't enforce the f-string rule or modify.
The Output
The current visitor correctly identifies logging calls but cannot distinguish between f-strings, regular strings, or concatenated strings. To enforce the f-string rule, you would need to inspect the arguments passed to the logging call. A literal string (ast.Constant with type str) or a string concatenation (ast.BinOp with ast.Add and ast.Constant) would be flagged as non-compliant. An ast.FormattedValue or an ast.JoinedStr would indicate an f-string.
The challenge is not just identification but transformation: converting a regular string or a concatenated string into an f-string requires careful reconstruction of the AST, which can be complex.
Why Python Does This
When Python executes code, it first parses the source text into an Abstract Syntax Tree (AST). This tree is a hierarchical representation of the code's structure, independent of its textual format. Each node in the AST corresponds to a syntactic construct (e.g., function definition, variable assignment, string literal, function call). The ast module exposes this internal representation, allowing developers to programmatically inspect, analyze, and even modify code. The ast.NodeVisitor traverses this tree, and ast.NodeTransformer can modify it. The challenge arises because Python code is dynamic and expressive, making it non-trivial to cover all edge cases when transforming between different syntactic forms (e.g., str + str(var) vs. f"{var}") without introducing new syntax errors or semantic changes.
The Fix
import ast
import textwrap
code_snippet = textwrap.dedent("""
import logging
def process_data(data):
logging.info("Processing data...")
user_id = data.get('user_id')
logging.debug('User ID: ' + str(user_id)) # This will be flagged/transformed
item_count = len(data.get('items', []))
logging.warning(f"Unexpected item count: {item_count}")
return data
process_data({'user_id': 123, 'items': [1,2,3]})
""")
class FstringEnforcer(ast.NodeTransformer):
def visit_Call(self, node):
self.generic_visit(node) # Crucial: visit children first to allow nested transformations
if isinstance(node.func, ast.Attribute) and \
isinstance(node.func.value, ast.Name) and \
node.func.value.id == 'logging' and \
node.func.attr in ['info', 'debug', 'warning', 'error', 'critical']:
if node.args and isinstance(node.args[0], ast.Constant) and \
isinstance(node.args[0].value, str):
# FIX 1: Transform simple string literals to f-strings.
# A real-world transformer would parse the string for variables.
# For demonstration, we simply wrap it in a JoinedStr node.
original_string = node.args[0].value
new_fstring_node = ast.JoinedStr([ast.Constant(value=original_string)])
node.args[0] = new_fstring_node
print(f"Transformed simple string at line {node.lineno} to f-string.")
elif node.args and isinstance(node.args[0], ast.BinOp) and \
isinstance(node.args[0].op, ast.Add):
# FIX 2: Identify and warn about string concatenations. Transformation is more complex.
print(f"Warning: String concatenation found at line {node.lineno}. Consider converting to f-string.")
return node
tree = ast.parse(code_snippet)
transformer = FstringEnforcer()
new_tree = transformer.visit(tree)
# Fix line numbers and other missing locations after transformations (important for ast.unparse)
ast.fix_missing_locations(new_tree)
print("\n--- Original Code ---")
print(code_snippet)
print("\n--- Transformed Code ---")
print(ast.unparse(new_tree)) # Use ast.unparse to convert modified AST back to code
The fix involves precisely identifying the ast.Call nodes, then delving into their args to check if they are ast.Constant (string literals), ast.JoinedStr (f-strings), or ast.BinOp (string concatenations). To enforce f-string usage, one would transform ast.Constant or ast.BinOp string arguments into ast.JoinedStr or flag them, leveraging the AST's exact representation of the code's structure.
How This Fails in Real Systems
A large Python codebase for an internal microservice platform used a custom decorator to manage database transactions. Developers frequently forgot to use async with for the transaction context, leading to uncommitted changes and data inconsistencies. A senior engineer built an AST transformer that analyzed decorated functions. If it found an await call within such a function but no async with db_session block, it would automatically inject the async with statement and re-write the function's body. This AST-based code generation prevented an entire class of hard-to-debug data integrity issues, reducing developer cognitive load and post-deployment bugs, after spending weeks debugging intermittent data issues.