Table R1 Results for each task every 100k iterations. Values are success rate (%) averaged over 4 random seeds (mean± std). Task 100k 200k 300k 400k 500k 600k 700k 800k 900k 1M pointmaze-teleport-navigate-v0 44.9 ± 6.6 47.2 ± 1.4 51.9 ± 3.9 52.8 ± 2.0 51.7 ± 1.0 51.6 ± 4.6 50.0 ± 5.2 54.1 ± 2.4 53.5 ± 7.3 53.9 ± 7.0 pointmaze-teleport-stitch-v0 25.3 ± 12.5 38.9 ± 4.5 38.7 ± 2.4 41.2 ± 6.9 37.3 ± 7.6 38.9 ± 8.1 40.9 ± 7.6 42.6 ± 4.2 38.0 ± 4.9 41.8 ± 6.0 pointmaze-medium-stitch-v0 63.5 ± 5.0 79.5 ± 5.8 91.3 ± 5.1 92.3 ± 6.5 96.1 ± 3.4 94.0 ± 7.7 97.2 ± 1.7 97.0 ± 5.9 98.8 ± 3.1 98.2 ± 5.3 pointmaze-large-stitch-v0 22.3 ± 1.0 27.3 ± 8.3 31.2 ± 2.4 26.1 ± 3.2 29.2 ± 7.4 28.9 ± 8.1 29.9 ± 7.4 31.1 ± 9.8 31.1 ± 6.5 28.0 ± 7.0 pointmaze-giant-stitch-v0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 antmaze-teleport-navigate-v0 39.5 ± 2.6 44.9 ± 1.6 46.7 ± 6.4 54.4 ± 4.6 50.3 ± 6.5 52.0 ± 2.1 50.1 ± 6.4 54.3 ± 2.2 47.6 ± 3.8 54.1 ± 1.7 antmaze-teleport-stitch-v0 38.9 ± 3.0 38.3 ± 2.3 40.4 ± 5.6 42.3 ± 4.0 42.1 ± 4.7 41.5 ± 6.1 39.5 ± 10.2 43.6 ± 7.0 43.6 ± 5.9 44.9 ± 4.7 antmaze-teleport-explore-v0 16.9 ± 3.8 20.4 ± 3.8 18.7 ± 3.9 24.0 ± 2.4 24.1 ± 3.4 26.0 ± 5.7 26.3 ± 4.2 26.4 ± 1.4 23.9 ± 2.8 28.0 ± 2.8 antmaze-medium-stitch-v0 56.7 ± 11.9 61.5 ± 4.3 73.2 ± 5.6 73.1 ± 3.7 72.8 ± 5.6 78.8 ± 7.3 76.8 ± 6.9 79.3 ± 3.4 81.5 ± 3.1 82.2 ± 2.4 antmaze-large-stitch-v0 14.3 ± 3.6 14.3 ± 4.8 13.5 ± 0.6 12.0 ± 4.8 14.4 ± 1.6 13.2 ± 2.1 13.9 ± 1.8 14.1 ± 1.0 14.7 ± 2.3 16.4 ± 1.4 antmaze-giant-stitch-v0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 cube-single-play-v0 68.9 ± 6.7 60.1 ± 4.0 49.7 ± 9.4 46.4 ± 8.8 41.5 ± 6.0 46.5 ± 5.7 41.6 ± 8.0 47.9 ± 9.6 45.2 ± 6.4 48.1 ± 6.0 cube-single-noisy-v0 81.3 ± 14.1 76.4 ± 13.9 78.9 ± 6.2 75.1 ± 5.2 68.7 ± 6.4 67.2 ± 3.8 69.5 ± 7.7 63.9 ± 1.4 69.3 ± 1.4 69.1 ± 3.4 scene-play-v0 46.4 ± 6.9 45.9 ± 5.6 45.2 ± 6.9 46.1 ± 5.8 48.9 ± 7.7 50.9 ± 5.1 49.2 ± 3.2 47.1 ± 4.7 43.1 ± 5.8 46.5 ± 4.0 scene-noisy-v0 27.6 ± 2.2 27.2 ± 10.1 34.1 ± 9.0 35.1 ± 6.8 39.7 ± 8.6 38.5 ± 7.9 33.2 ± 8.7 33.9 ± 11.0 35.2 ± 11.0 35.7 ± 12.7 Table R2 HIQL, NFTR, FMTR, and DMTR in ablation experiments. Values are success rate (%) averaged over 3 random seeds (mean ± std). FMTR and DMTR use the same hierarchy and triangle-slack reweighting as NFTR; the high-level subgoal head is conditional flow matching in FMTR and a diffusion model with DDIM sampling in DMTR. Task HIQL NFTR FMTR DMTR pointmaze-teleport-navigate-v0 18 ± 4 43.9 ± 7.8 37.9 ± 0.2 35.2 ± 3.7 pointmaze-teleport-stitch-v0 34 ± 4 48.6 ± 1.2 45.7 ± 1.0 37.9 ± 5.7 pointmaze-medium-stitch-v0 74 ± 6 93.9 ± 6.2 94.9 ± 0.7 95.5 ± 2.7 antmaze-teleport-navigate-v0 42 ± 3 52.0 ± 1.6 43.3 ± 3.0 43.3 ± 5.1 antmaze-teleport-stitch-v0 36 ± 2 42.2 ± 1.4 40.2 ± 0.9 35.9 ± 3.2 cube-single-play-v0 15 ± 3 47.1 ± 9.9 35.6 ± 8.6 17.4 ± 4.5 scene-noisy-v0 25 ± 4 32.6 ± 5.8 42.9 ± 1.0 31.4 ± 1.7 Table R3 Results of NFTR in the puzzle environment. Values are success rate (%) averaged over 3 random seeds (mean ± std). Task HIQL NFTR puzzle-3x3-play-v0 12 ± 2 14.9 ± 2.2 puzzle-4x4-play-v0 7 ± 2 38.1 ± 6.9 puzzle-4x5-play-v0 4 ± 1 6.7 ± 1.0 puzzle-4x6-play-v0 3 ± 1 3.5 ± 0.2 puzzle-3x3-noisy-v0 51 ± 11 18.0 ± 2.3 puzzle-4x4-noisy-v0 16 ± 4 53.0 ± 6.7 puzzle-4x5-noisy-v0 5 ± 1 8.8 ± 0.8 puzzle-4x6-noisy-v0 2 ± 1 5.2 ± 2.4 Table R4 Results of NFTR in the humanoidmaze environment. Values are success rate (%) averaged over 3 random seeds (mean ± std). Task HIQL NFTR humanoidmaze-medium-navigate-v0 89 ± 2 49.5 ± 3.8 humanoidmaze-large-navigate-v0 49 ± 4 7.7 ± 1.5 humanoidmaze-medium-stitch-v0 88 ± 2 62.9 ± 7.5 humanoidmaze-large-stitch-v0 28 ± 3 7.8 ± 2.3 Table R5 Ablation of κ ∈ { 0 , 0 5 , 1 , 2 , 5 } across environments. Values are success rate (%) averaged over 4 random seeds (mean± std). Task 0.0 0.5 1.0 2.0 5.0 pointmaze-teleport-navigate-v0 48.2 ± 3.6 48.7 ± 4.5 48.6 ± 10.5 53.8 ± 5.9 47.3 ± 2.5 pointmaze-teleport-stitch-v0 42.0 ± 4.7 37.1 ± 9.1 40.8 ± 8.8 40.6 ± 7.9 40.0 ± 8.9 pointmaze-medium-stitch-v0 98.0 ± 0.2 98.1 ± 1.2 95.3 ± 2.9 86.4 ± 8.7 87.9 ± 5.5 pointmaze-large-stitch-v0 30.3 ± 6.3 − 1 30.0 ± 5.7 31.8 ± 5.9 − 1 antmaze-teleport-navigate-v0 49.9 ± 0.3 50.5 ± 0.3 52.0 ± 0.6 48.0 ± 3.1 41.9 ± 3.2 antmaze-teleport-stitch-v0 41.1 ± 3.0 40.5 ± 3.7 44.0 ± 5.5 41.6 ± 1.6 33.1 ± 3.4 antmaze-teleport-explore-v0 14.2 ± 2.7 17.6 ± 4.5 19.8 ± 3.1 19.0 ± 1.4 − 1 antmaze-medium-stitch-v0 81.0 ± 3.7 80.8 ± 4.3 76.0 ± 5.5 70.2 ± 8.5 31.0 ± 6.2 antmaze-large-stitch-v0 16.9 ± 2.1 13.6 ± 2.5 14.4 ± 2.4 15.9 ± 0.8 11.7 ± 2.1 cube-single-play 22.2 ± 6.0 24.7 ± 4.3 31.3 ± 10.0 47.1 ± 9.9 46.2 ± 4.4 cube-single-noisy-v0 55.2 ± 3.3 62.8 ± 5.1 64.0 ± 2.7 67.4 ± 0.9 59.4 ± 3.9 scene-play-v0 51.7 ± 3.7 52.2 ± 1.1 47.7 ± 8.7 45.6 ± 3.8 24.8 ± 5.0 scene-noisy-v0 27.1 ± 2.1 32.8 ± 3.1 − 1 32.6 ± 5.8 − 1 Table R6 Ablation of κ ∈ { 0 , 0 5 , 1 , 2 , 5 } across environments. Values are success rate (%) averaged over 4 random seeds (mean ± std). Task 0.0 0.5 1.0 2.0 5.0 pointmaze-teleport-navigate-v0 48.2 ± 3.6 48.7 ± 4.5 48.6 ± 10.5 53.8 ± 5.9 47.3 ± 2.5 pointmaze-teleport-stitch-v0 37.0 ± 4.7 37.1 ± 9.1 40.8 ± 8.8 40.6 ± 7.9 40.0 ± 8.9 pointmaze-medium-stitch-v0 98.0 ± 0.2 98.1 ± 1.2 95.3 ± 2.9 86.4 ± 8.7 87.9 ± 5.5 pointmaze-large-stitch-v0 30.3 ± 6.3 − 1 30.0 ± 5.7 31.8 ± 5.9 − 1 antmaze-teleport-navigate-v0 49.9 ± 0.3 50.5 ± 0.3 52.0 ± 0.6 48.0 ± 3.1 41.9 ± 3.2 antmaze-teleport-stitch-v0 41.1 ± 3.0 40.5 ± 3.7 44.0 ± 5.5 41.6 ± 1.6 33.1 ± 3.4 antmaze-teleport-explore-v0 14.2 ± 2.7 17.6 ± 4.5 19.8 ± 3.1 19.0 ± 1.4 26.1 ± 1.4 antmaze-medium-stitch-v0 81.0 ± 3.7 80.8 ± 4.3 76.0 ± 5.5 70.2 ± 8.5 31.0 ± 6.2 antmaze-large-stitch-v0 16.9 ± 2.1 13.6 ± 2.5 14.4 ± 2.4 15.9 ± 0.8 11.7 ± 2.1 cube-single-play 22.2 ± 6.0 24.7 ± 4.3 31.3 ± 10.0 47.1 ± 9.9 46.2 ± 4.4 cube-single-noisy-v0 55.2 ± 3.3 62.8 ± 5.1 64.0 ± 2.7 67.4 ± 0.9 59.4 ± 3.9 scene-play-v0 51.7 ± 3.7 52.2 ± 1.1 47.7 ± 8.7 45.6 ± 3.8 24.8 ± 5.0 scene-noisy-v0 27.1 ± 2.1 32.8 ± 3.1 − 1 32.6 ± 5.8 − 1 Table R7 Results of GC-NF-RLBC and H-GC-NF-RLBC. Values are task success rates (%) averaged over 4 random seeds (mean± std). GC-NF-RLBC denotes goal-conditioned NF-RLBC; H-GC-NF-RLBC denotes hierarchical goal-conditioned NF-RLBC. Task HIQL NFTR GC-NF-RLBC H-GC-NF-RLBC pointmaze-teleport-navigate-v0 18 ± 4 53.8 ± 5.9 13.2 ± 6.4 40.1 ± 6.5 pointmaze-teleport-stitch-v0 34 ± 4 40.8 ± 8.8 19.5 ± 2.8 4.9 ± 4.8 pointmaze-medium-stitch-v0 74 ± 6 98.0 ± 0.2 22.3 ± 6.5 0.3 ± 0.4 pointmaze-large-stitch-v0 13 ± 6 30.0 ± 5.7 2.0 ± 2.1 0.8 ± 1.5 pointmaze-giant-stitch-v0 0 ± 0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 antmaze-teleport-navigate-v0 42 ± 3 52.0 ± 0.6 33.2 ± 3.2 49.5 ± 2.4 antmaze-teleport-stitch-v0 36 ± 2 44.0 ± 5.5 14.5 ± 1.3 36.8 ± 2.0 antmaze-teleport-explore-v0 34 ± 15 26.1 ± 1.4 0.7 ± 0.3 0.1 ± 0.1 antmaze-medium-stitch-v0 94 ± 1 81.0 ± 3.7 29.0 ± 3.2 61.0 ± 2.2 antmaze-large-stitch-v0 67 ± 5 15.1 ± 1.3 0.2 ± 0.1 4.2 ± 2.3 antmaze-giant-stitch-v0 2 ± 2 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 cube-single-play-v0 15 ± 3 47.1 ± 9.9 18.7 ± 0.3 9.1 ± 2.5 cube-single-noisy-v0 41 ± 6 67.4 ± 0.9 14.1 ± 0.8 18.3 ± 1.6 scene-play-v0 38 ± 3 45.6 ± 3.8 10.2 ± 1.2 20.4 ± 2.6 scene-noisy-v0 25 ± 4 32.6 ± 5.8 4.1 ± 0.2 11.3 ± 0.7