Pretraining Checkpoint Gallery

How state coordinated media exposure during pretraining shapes model responses

This page shows how LLM responses change as models are exposed to more state coordinated media during pretraining. We performed additional pretraining of Llama-2-13b at different checkpoints on three corpora: state scripted media, non-scripted state controlled media, and CulturaX (a general web corpus baseline).

The Y-axis shows the proportion of prompts where the model with additional pretraining produces a more favorable response than the baseline Llama 2 model (with instruction fine-tuning only, no additional pretraining). 0.5 = no difference from random guessing against baseline.

Pretraining on State Coordinated Media by Country and Language

The figure below shows results for Chinese-language prompts about China (prompts about Chinese leaders, institutions, and politics). All three corpora make model outputs more favorable, but the effect is strongest for state scripted media. With just 6,400 training documents, the model with additional pretraining produces a more pro-government response in roughly 80% of head-to-head comparisons against the base model.

Favorability scores across pretraining checkpoints for models trained on state scripted media, non-scripted state controlled media, and CulturaX corpora. Chinese-language prompts about China only.

Use the filters below to explore results beyond the default China-focused prompt / Chinese language prompting view. This interactive shows Chinese- and English-prompt results only; for spillover to other languages, see the multilingual figure further down the page.

Example Response

The baseline (left) is the Llama 2 model with instruction fine-tuning only. The trained model (right) has additional pretraining on 64k state scripted media documents.


Cross-Lingual Spillover

Additional pretraining with state coordinated media has spillover effects on other languages, with the largest effects on languages with similar writing systems (and thus overlapping tokens) such as traditional Chinese and Japanese.

Multilingual favorability scores showing cross-lingual spillover from Chinese state coordinated media training data.

Select a language to explore the spillover effect interactively.

Example Response