Research Design Conversation: AI - Generated Summaries & Text Retention Conversation between User and Undermind AI USER Standard version Role: You are a rigorous evidence - synthesis and research - design assistant. Research question (PICO): • Population: College/undergraduate students (18 – 25); include near - peers (older undergrads) but tag separately. • Intervention: Reading AI - generated summaries of course - relevant texts. Specify summary type (abstractive/extractive),length (tokens/percent of source), model/version, and whether it includes learning supports (headings, questions, highlights). • Comparator: Reading the full text (same content), matched for topic. • Outcomes: Text retention at ~24 hours and ~7 days using free recall, cued recall, and recognition; also metacognitive judgments (JOLs), comprehension, and transfer if available. • Setting: Lab, classroom, or online study with college students. Tasks: Scoping & retrieval • Search across education, psych, HCI, EdTech venues. Include near - topics: summarization effects, time - on - task controls,note - taking with summaries, LLM - generated study aids. • Capture grey literature (theses, preregs, tech reports) and replication attempts. Evidence extraction • For each study: design, N, population, materials (domain, length, difficulty), intervention details (summary generator, length,prompts), comparator parity, retention intervals, measures, stats, and time - on - task. • Record effect sizes (d, g, OR) or raw stats to compute them; note if conversion is impossible and why. Bias & quality • Rate with adapted tools (e.g., RoB - 2 for experiments; selection/performance/detection/reporting bias). Note preregistration, power, and materials/data sharing. Gap mapping & synthesis • Build a 2×N gap map: interval (24h, 7d) × design/moderators (summary type, length, domain, prior knowledge, time - on - task controlled? assessment type). • Identify where evidence is thin, contradictory, or methodologically weak. • Separate mechanism gaps (e.g., cognitive load, desirable difficulties) from measurement gaps (e.g., only recognition tests). Actionable next steps • Propose 3 study blueprints (lab, classroom RCT, preregistered online) with: hypotheses, manipulations, control of time - on - task and exposure, sample size (power for d=0.25 – 0.35), materials plan, prereg notes, and analysis plan (mixed models;interval × condi tion; robustness checks). Report format: A. Executive summary (≤200 words). B. Gap map table (heatmap counts + quality shading).C. Top 8 research gaps (each with why - it - matters + quick test). D. Effect - size harvest plot (if feasible). E. Study blueprints (3 specs). F. Bibliography (tagged, with links and replication status). Constraints: Prioritize post - 2015; tag pre - 2015 classics. Prefer peer - reviewed; label grey literature. Distinguish LLM - generated vs human - written summaries. UNDERMIND AI Excellent — this is a well - scoped, methods - forward evidence program. I want to ensure I capture your exact target and the boundaries before I crystallize the topic statement. Target construct and inclusion clarifications 1. "AI - generated summaries": Is your inclusion limited to LLMs (e.g., GPT - 3/4, Claude, PaLM) and neural abstractive/extractive systems post - 2015, or do you want to include earlier algorithmic summarizers (e.g., LexRank, TextRank, MEAD) as long as authorship is "machine - generated"? If included, I'll tag non - LLM vs LLM. 2. Comparator parity: If studies use full text plus equal time vs summary with shorter time, do you want two strata: (a) matched time - on - task; (b) matched exposure - to - content (even if time differs)? I can tag both and treat parity as a moderator. 3. Materials domain: Any academic content is eligible (STEM, social sciences, humanities)? Exclude news/non - course prose unless used in a university course context? 4. Outcome hierarchy: If multiple outcomes exist, should retention (free/cued/recognition) at 24h and 7d be primary, with metacognition/comprehension/transfer secondary? Any minimum test reliability requirements? 5. Retention windows tolerance : Will you accept "approximately 24h" as 18 – 30h and "approximately 7d" as 6 – 9 days? 6. Near - peers: You want older undergrads included but tagged. Should graduate students be excluded or included as a separate tag? 7. Setting: OK to include crowdsourced college samples (e.g., Prolific participants self - reporting current enrollment) if age 18 – 25 and status verified, or exclude unless verified by university enrollment? 8. Intervention details: Do you want to include studies where summaries are co - presented with adjunct cues (highlights, headings, questions) only if they're part of the AI output, or also if added by researchers? I'll tag both but can exclude the latter if you want clean AI - only effects. 9. Language and geography : English only, or any language if participants are proficient and materials are course - relevant? 10. Grey literature scope: Include preprints (arXiv, PsyArXiv), theses, preregs, and tech reports; exclude corporate blog posts unless they provide methods/data? 11. Date/venues: Prioritize post - 2015; include pre - 2015 "classics" on summarization effects and note - taking with summaries as background. Confirm this is acceptable. If you're comfortable with these defaults: include all machine - generated summaries (tag LLM vs non - LLM), broad academic domains, both parity strata, accept 18 – 30h and 6 – 9d windows, include verified college online samples, include researcher - added supports (tagged), English - only unless otherwise specified. Please confirm or adjust any of the above. Once confirmed, I'll lock the statement. USER Locked topic statement Topic: Do AI - generated summaries reduce delayed text retention relative to reading full texts among college students? Population: Undergraduate college students (18 – 25; older undergrads tagged). Graduate students included as tagged sensitivity only. Intervention: Machine - generated summaries of course - relevant texts (LLM or non - LLM; abstractive/extractive; summary length as % of source or token count; model/version; prompt; presence of AI - produced learning supports). Researcher - addedsupports tagged separately. Comparator: Reading the full text covering the same content. Parity strata: (a) Matched time - on - task; (b) Matched exposure - to - content (time may differ). Parity is a moderator. Primary outcomes: Delayed retention at ~24h (18 – 30h) and ~7d (6 – 9d) via free recall, cued recall, and recognition. Record reliability and item counts. Secondary outcomes: JOLs/metacognitive accuracy, comprehension, transfer, cognitive load, engagement. Settings: Lab, classroom, or verified online college samples; self - report - only online samples included but tagged lower quality. Materials: College - level texts from STEM/SS/Humanities; exclude non - course news unless used in coursework. Languages: Any; tag L1/L2 and language of materials; provide English - only sensitivity. Study quality coding : Randomization, preregistration, power, parity type, exposure controls, materials/data availability, verification of enrollment. Scope of evidence: 2015 – present prioritized for effect synthesis; pre - 2015 classics summarized for theoretical context only. Research Tools for Evidence Mapping and Discovery Elicit – AI assistant for literature discovery and hypothesis scoping. Consensus – AI tool that extracts and summarizes scientific consensus. Litmaps – Visual literature - mapping platform for citation exploration. These tools streamline literature review, gap mapping, and synthesis — supporting evidence - based research workflows.