Abstract
Evaluating and optimising authorial style in long-form story generation remains challenging because style is often assessed with ad hoc prompting and is frequently conflated with overall writing quality. We propose a two-stage pipeline. First, we train a dedicated style-similarity judge by fine-tuning a sentence-transformer with authorship-verification supervision, and calibrate its similarity outputs into a bounded [0,1] reward. Second, we use this judge as the primary reward in Group Relative Policy Optimization (GRPO) to fine-tune an 8B story generator for style-conditioned writing, avoiding the accept/reject supervision required by Direct Preference Optimization (DPO). Across four target authors (Mark Twain, Jane Austen, Charles Dickens, Thomas Hardy), the GRPO-trained 8B model achieves higher style scores than open-weight baselines, with an average style score of 0.893 across authors. These results suggest that AV-calibrated reward modelling provides a practical mechanism for controllable style transfer in long-form generation under a moderate model size and training budget.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 30th Conference on Computational Natural Language Learning |
| Publisher | Association for Computational Linguistics, ACL |
| Publication status | Accepted/In press - 21 Apr 2026 |
| Event | 30th Conference on Computational Natural Language Learning - San Diego, United States Duration: 3 Jul 2026 → 4 Jul 2026 https://conll.org/2026 |
Conference
| Conference | 30th Conference on Computational Natural Language Learning |
|---|---|
| Abbreviated title | CoNLL 2026 |
| Country/Territory | United States |
| City | San Diego |
| Period | 3/07/26 → 4/07/26 |
| Internet address |
Bibliographical note
Not yet published as of 21/04/2026.Fingerprint
Dive into the research topics of 'Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning'. Together they form a unique fingerprint.Research output
- 1 Preprint
-
Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning
Liu, J., Bahja, M., Kovatchev, V. & Lee, M., 5 Dec 2025, arXiv.Research output: Working paper/Preprint › Preprint
File4 Downloads (Pure)
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver