Python sqlite3.Cursor.arraysize Attribute
Last modified April 15, 2025
This comprehensive guide explores Python's sqlite3.Cursor.arraysize
attribute. We'll cover its purpose, optimization benefits, and practical usage
examples with SQLite databases.
Basic Definitions
The arraysize
attribute of a SQLite cursor determines how many rows
are fetched at once when calling fetchmany
. It optimizes database
operations by reducing round trips.
Default value is 1, meaning each fetchmany
call retrieves one row.
Increasing this value can improve performance when processing large result sets.
Default arraysize Behavior
This example demonstrates the default behavior of arraysize
when
not explicitly set. The cursor fetches one row at a time.
import sqlite3 with sqlite3.connect(':memory:') as conn: conn.execute('CREATE TABLE test (id INTEGER, data TEXT)') conn.executemany('INSERT INTO test VALUES (?, ?)', [(i, f'Data {i}') for i in range(1, 6)]) with conn.cursor() as cursor: cursor.execute('SELECT * FROM test') print(f"Default arraysize: {cursor.arraysize}") # Output: 1 # fetchmany() returns 1 row by default print(cursor.fetchmany()) # [(1, 'Data 1')] print(cursor.fetchmany(2)) # [(2, 'Data 2'), (3, 'Data 3')]
The example shows that without setting arraysize
, it defaults to 1.
The fetchmany
method can still fetch more rows by specifying a
parameter.
This default behavior is safe but may not be optimal for large datasets where multiple rows are typically processed together.
Setting arraysize for Batch Fetching
Here we set arraysize
to fetch multiple rows at once, improving
performance for large result sets.
import sqlite3 with sqlite3.connect('products.db') as conn: conn.execute('''CREATE TABLE IF NOT EXISTS products (id INTEGER PRIMARY KEY, name TEXT, price REAL)''') conn.executemany('INSERT INTO products (name, price) VALUES (?, ?)', [('Laptop', 999.99), ('Phone', 699.99), ('Tablet', 399.99), ('Monitor', 249.99), ('Keyboard', 49.99)]) with conn.cursor() as cursor: cursor.arraysize = 3 # Set batch size to 3 rows cursor.execute('SELECT * FROM products') while True: batch = cursor.fetchmany() if not batch: break print(f"Fetched batch of {len(batch)} items:") for product in batch: print(f" {product[1]}: ${product[2]:.2f}")
This example sets arraysize
to 3, causing fetchmany
to return 3 rows per call. The loop processes products in batches rather than
individually.
Batch fetching reduces database round trips, which is especially beneficial for networked databases or large result sets.
arraysize with Large Datasets
For very large datasets, an appropriate arraysize
can significantly
reduce memory usage and improve performance.
import sqlite3 import time def process_large_data(arraysize): with sqlite3.connect('large_data.db') as conn: conn.execute('CREATE TABLE IF NOT EXISTS measurements (id INTEGER, value REAL)') # Insert sample data (100,000 rows) if conn.execute('SELECT COUNT(*) FROM measurements').fetchone()[0] == 0: conn.executemany('INSERT INTO measurements VALUES (?, ?)', [(i, i * 0.1) for i in range(1, 100001)]) with conn.cursor() as cursor: cursor.arraysize = arraysize start_time = time.time() cursor.execute('SELECT * FROM measurements') total_rows = 0 while True: batch = cursor.fetchmany() if not batch: break total_rows += len(batch) # Process batch here elapsed = time.time() - start_time print(f"arraysize {arraysize}: Processed {total_rows} rows in {elapsed:.3f} seconds") # Compare different array sizes process_large_data(1) # Default process_large_data(100) # Moderate process_large_data(1000) # Large
This benchmark compares different arraysize
values when processing
100,000 rows. Larger sizes generally perform better but require more memory.
The optimal value depends on your specific use case, available memory, and row size. Testing different values is recommended.
arraysize with fetchall()
While arraysize
primarily affects fetchmany
, it can
influence memory usage with fetchall
in some implementations.
import sqlite3 import sys with sqlite3.connect(':memory:') as conn: conn.execute('CREATE TABLE data (id INTEGER, content TEXT)') conn.executemany('INSERT INTO data VALUES (?, ?)', [(i, 'X' * 1000) for i in range(1, 1001)]) # 1MB of data with conn.cursor() as cursor: # Small arraysize cursor.arraysize = 1 print("Memory with arraysize=1:", end=' ') data = cursor.execute('SELECT * FROM data').fetchall() print(f"{sys.getsizeof(data) / 1024:.1f} KB") # Larger arraysize cursor.arraysize = 100 print("Memory with arraysize=100:", end=' ') data = cursor.execute('SELECT * FROM data').fetchall() print(f"{sys.getsizeof(data) / 1024:.1f} KB")
This example shows how arraysize
might affect memory usage even
with fetchall
. The difference varies by Python implementation.
For consistent behavior, explicitly manage batch sizes with fetchmany
when memory efficiency is critical.
Dynamic arraysize Adjustment
The arraysize
can be adjusted dynamically based on query results or
system conditions.
import sqlite3 import psutil def get_memory_usage(): return psutil.virtual_memory().percent with sqlite3.connect('adaptive.db') as conn: conn.execute('''CREATE TABLE IF NOT EXISTS sensor_data (timestamp TEXT, sensor_id INTEGER, value REAL)''') # Insert sample sensor data if conn.execute('SELECT COUNT(*) FROM sensor_data').fetchone()[0] == 0: import datetime now = datetime.datetime.now() data = [(str(now - datetime.timedelta(seconds=i)), i % 10, i * 0.1) for i in range(10000)] conn.executemany('INSERT INTO sensor_data VALUES (?, ?, ?)', data) with conn.cursor() as cursor: cursor.execute('SELECT * FROM sensor_data ORDER BY timestamp DESC') # Start with moderate arraysize cursor.arraysize = 100 while True: # Adjust arraysize based on memory pressure if get_memory_usage() > 70: cursor.arraysize = max(10, cursor.arraysize // 2) elif get_memory_usage() < 50 and cursor.arraysize < 1000: cursor.arraysize = min(1000, cursor.arraysize * 2) batch = cursor.fetchmany() if not batch: break print(f"Processing {len(batch)} rows (arraysize: {cursor.arraysize})") # Process batch here
This advanced example dynamically adjusts arraysize
based on system
memory usage. It reduces batch size when memory is constrained and increases it
when available.
Such adaptive approaches are useful in resource-constrained environments or when processing unpredictable data sizes.
arraysize with Different Row Sizes
The optimal arraysize
may vary depending on the size of each row in
your result set.
import sqlite3 import time def test_row_size(row_size, arraysize): with sqlite3.connect(':memory:') as conn: # Create table with specified row size conn.execute(f'CREATE TABLE data (id INTEGER, content TEXT)') data = [(i, 'X' * row_size) for i in range(1000)] conn.executemany('INSERT INTO data VALUES (?, ?)', data) with conn.cursor() as cursor: cursor.arraysize = arraysize start = time.time() cursor.execute('SELECT * FROM data') total = 0 while True: batch = cursor.fetchmany() if not batch: break total += len(batch) elapsed = time.time() - start print(f"Row size {row_size}B, arraysize {arraysize}: {elapsed:.4f}s") # Test different combinations test_row_size(10, 100) # Small rows, large batch test_row_size(10, 10) # Small rows, small batch test_row_size(1000, 100) # Large rows, large batch test_row_size(1000, 10) # Large rows, small batch
This example demonstrates how row size affects the performance of different
arraysize
values. Larger rows may require smaller batch sizes to
avoid excessive memory usage.
When dealing with large BLOBs or text fields, consider smaller arraysize
values to balance performance and memory consumption.
Best Practices
- Profile performance: Test different arraysize values for your specific use case
- Consider memory: Larger arraysize uses more memory per fetch
- Network considerations: Larger batches reduce round trips for remote databases
- Default is safe: arraysize=1 works for all cases but may be slower
- Document your choice: Comment why you chose a particular arraysize value
Source References
Author
List all Python tutorials.