A retrospective board built around visible Jira work
Built for retrospectives and team reflection
Project page / SeeCodes
Team Competition
Visible contributors
3
Jira visibility enforced
Visible solved tasks
11
Done tasks in scope
Relative effort produced
764
Composite effort units in scope
Monthly baseline
100
Typical visible completed task
AI-generated retrospective board
“Effort” is a productivity-oriented composite of active minutes, changed LOC, files changed, and architecture / logic / UI-specification signals.
Raw effort = active minutes × 1.8 + LOC changed × 0.18 + files changed × 3, plus bonuses for architecture (+24), logic (+14), and UI / specification (+10). A relative score of 100 means roughly “typical for this month.”
Competition Board
Sort by relative effort, solved tasks, logic, architecture, UI/spec, or contributor name.
Avery Chen
5 solved • 7 contributed tasks
Relative effort 312 • Active minutes 184 • LOC changed 1210 • Files changed 47 • Avg solved-task effort 128
Priya Shah
4 solved • 6 contributed tasks
Relative effort 254 • Active minutes 162 • LOC changed 990 • Files changed 39 • Avg solved-task effort 117
Jordan Miles
3 solved • 5 contributed tasks
Relative effort 198 • Active minutes 141 • LOC changed 740 • Files changed 31 • Avg solved-task effort 109
Solved Task Board
Top solved tasks in scope with relative effort and transparent raw-effort drivers.
PLAT-221
Rework auth token rotation
Assignee Avery Chen • 23 active min • 78 LOC changed • 3 files changed
PAY-84
Stabilize billing guardrails
Assignee Priya Shah • 22 active min • 78 LOC changed • 4 files changed
WEB-97
Refine logout and error messaging
Assignee Jordan Miles • 19 active min • 61 LOC changed • 3 files changed
How the page reads effectiveness from effort
Important nuance: the formula is an effort proxy
Raw effort = active minutes × 1.8
+ LOC changed × 0.18
+ files changed × 3
+ architecture signal ? 24 : 0
+ logic signal ? 14 : 0
+ UI / spec signal ? 10 : 0
Relative score ≈ raw effort / monthly typical raw effort × 100
100 ≈ "typical for this month"The strongest scientific case is for the choice of inputs and for the monthly normalization step. The exact coefficients are still product heuristics chosen so that time, churn, diffusion, and semantically heavier work all stay visible in one interpretable index. The next section makes those research links explicit.
How to read a score of 100
- 100 ≈ a typical visible completed task in the current month.
- 120 means roughly 20% above that month’s visible baseline, not 20% 'better' than another team in another repository.
- 60 means lighter than that month’s typical visible completed task, not low value or weak performance.
Where “effectiveness” actually shows up
- Solved count shows whether effort is turning into finished work.
- Average solved-task effort shows whether someone is closing lighter or heavier completed tasks.
- Dominant-area labels show whether contribution skewed toward architecture, logic, or UI/spec work.
Active minutes × 1.8
Minutes with detected work activity give the score a focused-work component, but the page explicitly frames them as a proxy for retrospective context rather than payroll time.
Changed LOC × 0.18
Code churn captures implementation magnitude. The lower coefficient prevents raw line volume from overpowering every other signal on the board.
Files changed × 3
Cross-file changes usually imply broader reasoning, more coordination, and more places where side effects can appear, so the model gives diffusion visible weight.
Semantic bonuses
Architecture (+24), logic (+14), and UI / specification (+10) bonuses stop the model from pretending that every edit has the same blast radius or product meaning.
The giants our Effectiveness formula stands on
Built on strong ideas, designed for real teams
Research link
The SPACE of Developer ProductivityBacks: The overall framing of the metric as a multidimensional retrospective index rather than a single universal truth.
Connection to our formula: Supports using activity, solved work, and context together, while treating 100 as a month-relative baseline instead of a universal productivity grade.
Research link
Mind the Gap: On the Relationship Between Automatically Measured and Self-Reported ProductivityBacks: Combining observable telemetry with subjective interpretation instead of pretending automation captures the whole story.
Connection to our formula: Supports positioning the score as a proxy for retrospection, not payroll or surveillance, and supports user-level or context-level baselines over one absolute benchmark.
Research link
Software Developers' Perceptions of ProductivityBacks: Reading effort next to solved work rather than equating raw activity with productivity.
Connection to our formula: Supports showing solved count and average solved-task effort beside the formula so the board reflects finished outcomes, task size, and interruption-sensitive work.
Research link
The Work Life of Developers: Activities, Switches and Perceived ProductivityBacks: Using activity and focus signals as contextual proxies while keeping the measure holistic and personal.
Connection to our formula: Supports active minutes as one ingredient in a broader board and reinforces that the score should not be compared as an absolute truth across unrelated people, months, or teams.
Research link
Using Logs Data to Identify When Software Engineers Experience Flow or Focused WorkBacks: The active-minutes term as a defensible proxy for focused work.
Connection to our formula: Supports the page language that active minutes are about focus and context for retrospective interpretation, not attendance or payroll time.
Research link
Does Measuring Code Change Improve Fault Prediction?Backs: LOC changed and code churn as meaningful signals for change magnitude.
Connection to our formula: Supports keeping changed LOC in the formula as a visible effort dimension while using a modest coefficient so line volume does not dominate the whole score.
Research link
An Industrial Study on the Risk of Software ChangesBacks: Weighting larger and more structurally involved changes more heavily than tiny edits.
Connection to our formula: Supports giving code churn real influence in the score because larger changes often carry more coordination cost and risk than equally counted ticket completions.
Research link
Predicting Faults Using the Complexity of Code ChangesBacks: Files changed and change diffusion as visible effort and risk dimensions beyond raw LOC.
Connection to our formula: Supports the files-changed term because more scattered changes are harder to reason about, easier to miss in review, and often heavier than the same LOC concentrated in one place.
Research link
How, and Why, Process Metrics Are BetterBacks: Preferring process and change-side signals over raw static code metrics alone.
Connection to our formula: Supports the overall formula shape because active time, churn, and diffusion are all process-side signals that say more about current change work than static repository snapshots do.
Research link
An Empirical Study of Just-in-Time Defect Prediction using Cross-Project ModelsBacks: Normalization of diffusion-like metrics instead of presenting raw counts as the final meaning.
Connection to our formula: Supports exposing a relative normalized score where 100 means typical for the visible month, rather than asking readers to interpret raw effort values without context.
Research link
Variations on Using Propagation Costs to Measure Architecture Modifiability PropertiesBacks: The architecture bonus, because structural counts alone miss the semantics and propagation cost of architectural work.
Connection to our formula: Supports giving architecture work extra credit beyond minutes, LOC, and file counts when the change has broader modifiability and system-shape implications.
Research link
Impact of Requirements Volatility on Software Architecture: How Do Software Teams Keep Up with Ever-Changing Requirements?Backs: The link between requirements or specification work and downstream architectural cost.
Connection to our formula: Supports treating spec-facing work as real effort even when code volume is modest, because requirement volatility can increase architecture complexity and coordination cost.
Research link
Ambiguous Software Requirement Specification DetectionBacks: The UI / spec bonus, especially the specification half, because ambiguous specs create risk, rework, and delivery cost.
Connection to our formula: Supports keeping a spec signal in the model even if the exact +10 value stays heuristic; the page should not pretend only raw code volume matters.
Why teams can trust the shape of this model
Research-backed direction. Product-shaped scoring.
Great work is bigger than one signal
Developer-productivity research keeps landing on the same point: strong contribution is multidimensional. That is why this model combines activity, change size, diffusion, and semantic signals instead of pretending one number can explain everything by itself.
Meaningful change should count
Research on code churn shows that change volume can reflect real implementation weight. That makes changed LOC a useful input here, not as a vanity metric, but as one visible part of how much work a completed change likely carried.
Broad changes usually carry more load
Studies on software-change risk repeatedly show that work spread across more files or subsystems is harder to reason about and easier to miss in review. That is why the model gives cross-file diffusion visible weight.
System-shaping work deserves extra visibility
Architecture and specification work often create cost, coordination, and downstream impact that raw structural counts miss. The bonus signals help the board recognize work that changes system shape, logic complexity, or product-definition clarity.
What the research world keeps pointing to
- Developer productivity is multidimensional, so no single activity metric should define contribution.
- Observable telemetry is useful, but it works best when read in context rather than as a universal truth.
- Code churn helps represent implementation magnitude when interpreted as relative change weight.
- Files touched and change diffusion are strong signals for broader, riskier, harder-to-review work.
- Architecture and specification signals matter because semantic impact can exceed what raw counts show.
- Normalization matters because teams need a readable month-relative baseline, not an absolute productivity grade.
How strong teams use the score well
- Use the score to guide retrospectives, not to replace judgment.
- Compare people only inside similar scope, visibility, and time windows.
- Read effort next to solved work, review quality, and task difficulty.
- Treat 100 as a month-relative baseline, not as a target for human worth.
- Never turn the score into surveillance, payroll logic, or a one-number performance system.
Where teams use it
- Run retrospectives with a shared view of solved work, contributor mix, and relative task effort.
- Celebrate specialists and generalists without forcing the conversation into raw ticket counts.
- Spot work that needed multiple contributors or carried unusually high relative effort.
- Find follow-up topics for pairing, knowledge sharing, or planning quality in the next sprint.
Best used for retrospectives and coaching