torch 2.10.0+cu128 | cuda 12.8 | device cuda
Geometric structure of temporal concepts and the limits of sparse autoencoders decompostion.
Shattering of “time” with SAE.
(details to be described).
Environment setup
Load model and SAE
Loaded pretrained model google/gemma-2-2b into HookedTransformer
n_layers=26, d_model=2304
GPU allocated: 15.2GB / 15GB
SAE d_sae: 16384, d_in: 2304
{'d_in': 2304, 'd_sae': 16384, 'dtype': 'bfloat16', 'device': 'cuda', 'apply_b_dec_to_input': False, 'normalize_activations': 'none', 'reshape_activations': 'none', 'metadata': SAEMetadata({'sae_lens_version': '6.44.2', 'sae_lens_training_version': None, 'model_name': 'gemma-2-2b', 'hook_name': 'blocks.20.hook_resid_post', 'hook_head_index': None, 'prepend_bos': True, 'dataset_path': 'monology/pile-uncopyrighted', 'context_size': 1024, 'neuronpedia_id': 'gemma-2-2b/20-gemmascope-res-16k'})}
Reproduction of days-of-weeksFrom paper:NOT ALL LANGUAGE MODEL FEATURES AREONE-DIMENSIONALLY LINEAR: https://arxiv.org/pdf/2405.14860(residual PCA at layer 15)

Variance explained: [0.3746124 0.2531936 0.13253301 0.12052244]
Cumulative: [0.3746124 0.62780595 0.76033896 0.8808614 ]
Encoding residuals ->feature activations
Days-of-wwek SAE decoder PCA
max act: 2040.0 | nonzero: 7208
max act: 2040.0 | nonzero: 7313
max act: 2040.0 | nonzero: 7231
max act: 2040.0 | nonzero: 7475
max act: 2040.0 | nonzero: 7314
max act: 2040.0 | nonzero: 7234
max act: 2040.0 | nonzero: 7276
max act: 2040.0 | nonzero: 7285
max act: 2040.0 | nonzero: 7203
max act: 2040.0 | nonzero: 7308
max act: 2040.0 | nonzero: 7222
max act: 2040.0 | nonzero: 7475
max act: 2040.0 | nonzero: 7310
max act: 2040.0 | nonzero: 7239
max act: 2040.0 | nonzero: 7274
max act: 2040.0 | nonzero: 7282
max act: 2040.0 | nonzero: 7210
max act: 2040.0 | nonzero: 7308
max act: 2040.0 | nonzero: 7225
max act: 2040.0 | nonzero: 7475
max act: 2040.0 | nonzero: 7310
max act: 2040.0 | nonzero: 7238
max act: 2040.0 | nonzero: 7271
max act: 2040.0 | nonzero: 7280
max act: 2040.0 | nonzero: 7202
max act: 2040.0 | nonzero: 7314
max act: 2040.0 | nonzero: 7229
max act: 2040.0 | nonzero: 7475
max act: 2040.0 | nonzero: 7313
max act: 2040.0 | nonzero: 7238
max act: 2040.0 | nonzero: 7268
max act: 2040.0 | nonzero: 7283
max act: 2040.0 | nonzero: 7218
max act: 2040.0 | nonzero: 7316
max act: 2040.0 | nonzero: 7234
max act: 2040.0 | nonzero: 7468
max act: 2040.0 | nonzero: 7309
max act: 2040.0 | nonzero: 7233
max act: 2040.0 | nonzero: 7274
max act: 2040.0 | nonzero: 7284
max act: 2040.0 | nonzero: 7193
max act: 2040.0 | nonzero: 7307
max act: 2040.0 | nonzero: 7219
max act: 2040.0 | nonzero: 7470
max act: 2040.0 | nonzero: 7310
max act: 2040.0 | nonzero: 7233
max act: 2040.0 | nonzero: 7268
max act: 2040.0 | nonzero: 7276
max act: 2040.0 | nonzero: 7211
max act: 2040.0 | nonzero: 7312
max act: 2040.0 | nonzero: 7232
max act: 2040.0 | nonzero: 7470
max act: 2040.0 | nonzero: 7311
max act: 2040.0 | nonzero: 7235
max act: 2040.0 | nonzero: 7280
max act: 2040.0 | nonzero: 7281
Found 429 candidate features active on day prompts
--------------------------------------------------------------------------- OutOfMemoryError Traceback (most recent call last) /tmp/ipykernel_142/1980316275.py in <cell line: 0>() 15 16 # Get their decoder vectors → (n_features, d_model) ---> 17 W_dec = sae.W_dec.float().cpu().numpy() # (d_sae, d_model) 18 day_decoders = W_dec[day_features] 19 OutOfMemoryError: CUDA out of memory. Tried to allocate 144.00 MiB. GPU 0 has a total capacity of 14.56 GiB of which 103.81 MiB is free. Including non-PyTorch memory, this process has 14.46 GiB memory in use. Of the allocated memory 14.27 GiB is allocated by PyTorch, and 64.77 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Load temporal prompts: past, present, future, counterfactual and hypotheticalFrom Kaggle’s imported dataset
Geometry at 8, 14, 20, 25 layers
Layer 8 done
Layer 14 done
Layer 20 done
Layer 25 done

Temporal category means at layer 20
max act: 2040.0 | nonzero: 7539
max act: 2040.0 | nonzero: 7409
max act: 2040.0 | nonzero: 7524
max act: 2040.0 | nonzero: 7504
max act: 2040.0 | nonzero: 7503
max act: 2040.0 | nonzero: 7528
max act: 2040.0 | nonzero: 7508
max act: 2040.0 | nonzero: 7520
max act: 2040.0 | nonzero: 7691
max act: 2040.0 | nonzero: 7531
max act: 2040.0 | nonzero: 7645
max act: 2040.0 | nonzero: 7846
max act: 2040.0 | nonzero: 7893
max act: 2040.0 | nonzero: 7535
max act: 2040.0 | nonzero: 7723
max act: 2040.0 | nonzero: 7597
max act: 2040.0 | nonzero: 7454
max act: 2040.0 | nonzero: 7613
max act: 2040.0 | nonzero: 7564
max act: 2040.0 | nonzero: 7724
past: 20 prompts processed
max act: 2040.0 | nonzero: 7547
max act: 2040.0 | nonzero: 7494
max act: 2040.0 | nonzero: 7498
max act: 2040.0 | nonzero: 7586
max act: 2040.0 | nonzero: 7653
max act: 2040.0 | nonzero: 7507
max act: 2040.0 | nonzero: 7781
max act: 2040.0 | nonzero: 7417
max act: 2040.0 | nonzero: 7550
max act: 2040.0 | nonzero: 7392
max act: 2040.0 | nonzero: 7493
max act: 2040.0 | nonzero: 7570
max act: 2040.0 | nonzero: 7713
max act: 2040.0 | nonzero: 7620
max act: 2040.0 | nonzero: 7567
max act: 2040.0 | nonzero: 7579
max act: 2040.0 | nonzero: 7456
max act: 2040.0 | nonzero: 7476
max act: 2040.0 | nonzero: 7780
max act: 2040.0 | nonzero: 7566
present: 20 prompts processed
max act: 2040.0 | nonzero: 7641
max act: 2040.0 | nonzero: 7522
max act: 2040.0 | nonzero: 7558
max act: 2040.0 | nonzero: 7594
max act: 2040.0 | nonzero: 7581
max act: 2040.0 | nonzero: 7584
max act: 2040.0 | nonzero: 7663
max act: 2040.0 | nonzero: 7790
max act: 2040.0 | nonzero: 7721
max act: 2040.0 | nonzero: 7666
max act: 2040.0 | nonzero: 7563
max act: 2040.0 | nonzero: 7727
max act: 2040.0 | nonzero: 7591
max act: 2040.0 | nonzero: 7613
max act: 2040.0 | nonzero: 7546
max act: 2040.0 | nonzero: 7372
max act: 2040.0 | nonzero: 7408
max act: 2040.0 | nonzero: 7619
max act: 2040.0 | nonzero: 7618
max act: 2040.0 | nonzero: 7609
future: 20 prompts processed
max act: 2040.0 | nonzero: 7935
max act: 2040.0 | nonzero: 8004
max act: 2040.0 | nonzero: 8128
max act: 2040.0 | nonzero: 8183
max act: 2040.0 | nonzero: 7943
max act: 2040.0 | nonzero: 8124
max act: 2040.0 | nonzero: 8238
max act: 2040.0 | nonzero: 8123
max act: 2040.0 | nonzero: 8023
max act: 2040.0 | nonzero: 7977
max act: 2040.0 | nonzero: 7697
max act: 2040.0 | nonzero: 7868
max act: 2040.0 | nonzero: 7912
max act: 2040.0 | nonzero: 7928
max act: 2040.0 | nonzero: 7986
max act: 2040.0 | nonzero: 7992
max act: 2040.0 | nonzero: 7873
max act: 2040.0 | nonzero: 7783
max act: 2040.0 | nonzero: 7685
max act: 2040.0 | nonzero: 7886
counterfactual: 20 prompts processed
max act: 2040.0 | nonzero: 7829
max act: 2040.0 | nonzero: 7854
max act: 2040.0 | nonzero: 8083
max act: 2040.0 | nonzero: 7973
max act: 2040.0 | nonzero: 7870
max act: 2040.0 | nonzero: 7886
max act: 2040.0 | nonzero: 7926
max act: 2040.0 | nonzero: 7876
max act: 2040.0 | nonzero: 8087
max act: 2040.0 | nonzero: 7886
max act: 2040.0 | nonzero: 7775
max act: 2040.0 | nonzero: 7857
max act: 2040.0 | nonzero: 7934
max act: 2040.0 | nonzero: 8015
max act: 2040.0 | nonzero: 8054
max act: 2040.0 | nonzero: 8279
max act: 2040.0 | nonzero: 7804
max act: 2040.0 | nonzero: 7836
max act: 2040.0 | nonzero: 7734
max act: 2040.0 | nonzero: 7958
hypothetical: 20 prompts processed
max act: 2040.0 | nonzero: 8065
max act: 2040.0 | nonzero: 7430
max act: 2040.0 | nonzero: 7641
max act: 2040.0 | nonzero: 7597
max act: 2040.0 | nonzero: 7673
max act: 2040.0 | nonzero: 7861
max act: 2040.0 | nonzero: 7523
max act: 2040.0 | nonzero: 7679
max act: 2040.0 | nonzero: 8263
max act: 2040.0 | nonzero: 7745
max act: 2040.0 | nonzero: 7456
max act: 2040.0 | nonzero: 7660
max act: 2040.0 | nonzero: 7432
max act: 2040.0 | nonzero: 7689
max act: 2040.0 | nonzero: 7903
max act: 2040.0 | nonzero: 7417
max act: 2040.0 | nonzero: 7797
max act: 2040.0 | nonzero: 7889
max act: 2040.0 | nonzero: 7673
max act: 2040.0 | nonzero: 7594
neutral: 20 prompts processed
top differential features per category
=== PAST ===
feature_id diff_score past_mean neutral_mean neuronpedia
1858 24.0000 64.0000 40.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/1858
2230 16.2500 16.2500 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2230
1548 15.5625 15.5625 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/1548
2914 14.8125 16.1250 1.289062 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2914
12545 14.1875 14.1875 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/12545
6631 13.7500 52.2500 38.500000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/6631
2229 13.0000 64.5000 51.500000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2229
2238 12.8125 26.0000 13.187500 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2238
15383 11.5000 11.5000 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/15383
5571 11.3750 12.1875 0.789062 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/5571
5890 11.0000 11.5000 0.474609 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/5890
1306 10.5000 10.5000 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/1306
12265 9.6250 12.5000 2.859375 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/12265
7116 9.5000 9.5000 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/7116
10377 9.4375 9.4375 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/10377
=== PRESENT ===
feature_id diff_score present_mean neutral_mean neuronpedia
1858 19.25000 59.25000 40.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/1858
6631 15.00000 53.50000 38.500000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/6631
2230 14.31250 14.31250 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2230
2914 13.75000 15.06250 1.289062 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2914
12545 12.75000 12.75000 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/12545
2489 12.43750 12.43750 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2489
2238 12.18750 25.37500 13.187500 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2238
7116 11.62500 11.62500 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/7116
2229 11.50000 63.00000 51.500000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2229
12265 11.00000 13.87500 2.859375 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/12265
5890 9.75000 10.25000 0.474609 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/5890
3971 8.87500 14.25000 5.375000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/3971
9768 8.75000 44.00000 35.250000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/9768
15383 8.68750 8.68750 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/15383
4373 7.59375 7.59375 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/4373
=== FUTURE ===
feature_id diff_score future_mean neutral_mean neuronpedia
16148 17.5000 17.5000 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/16148
2230 16.3750 16.3750 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2230
6631 15.7500 54.2500 38.500000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/6631
2229 14.5000 66.0000 51.500000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2229
1858 13.0000 53.0000 40.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/1858
9520 12.3750 12.3750 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/9520
7569 12.1250 12.1250 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/7569
1322 12.0000 12.4375 0.421875 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/1322
2238 11.9375 25.1250 13.187500 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2238
12265 11.8750 14.7500 2.859375 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/12265
12545 11.5625 11.5625 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/12545
5890 11.1250 11.6250 0.474609 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/5890
1630 8.6250 8.6250 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/1630
7116 8.4375 8.4375 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/7116
12341 8.2500 8.7500 0.519531 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/12341
=== COUNTERFACTUAL ===
feature_id diff_score counterfactual_mean neutral_mean neuronpedia
1858 29.0000 69.0000 40.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/1858
15383 19.0000 19.0000 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/15383
5571 18.8750 19.6250 0.789062 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/5571
2238 17.2500 30.5000 13.187500 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2238
14267 17.0000 17.0000 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/14267
12545 16.2500 16.2500 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/12545
2230 14.1250 14.1250 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2230
9909 13.7500 22.0000 8.250000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/9909
5890 13.3125 13.8125 0.474609 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/5890
9070 12.5625 12.5625 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/9070
12265 11.6250 14.5000 2.859375 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/12265
9492 11.1250 11.1250 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/9492
1720 11.0625 11.5000 0.437500 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/1720
5371 11.0625 11.8750 0.828125 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/5371
7116 10.1875 10.1875 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/7116
=== HYPOTHETICAL ===
feature_id diff_score hypothetical_mean neutral_mean neuronpedia
1858 18.25000 58.25000 40.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/1858
1720 16.75000 17.12500 0.437500 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/1720
2238 13.68750 26.87500 13.187500 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2238
5890 10.25000 10.75000 0.474609 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/5890
5571 10.25000 11.06250 0.789062 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/5571
2230 9.31250 9.31250 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2230
15383 7.90625 7.90625 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/15383
16148 7.56250 7.56250 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/16148
12265 7.00000 9.87500 2.859375 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/12265
14041 6.81250 7.31250 0.507812 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/14041
9635 6.68750 6.68750 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/9635
9070 6.25000 6.25000 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/9070
1548 6.18750 6.18750 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/1548
2956 5.93750 6.31250 0.373047 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/2956
12545 5.90625 5.90625 0.000000 https://neuronpedia.org/gemma-2-2b/20-gemmascope-res-16k/12545
temporal geometry PCA

Variance explained: [0.4102727 0.33915818 0.13833533 0.11223382]
Cumulative: [0.4102727 0.7494309 0.88776624 1. ]
:::