Module 2 Assignment: Data preparation and feature pipelines

Contents

Module 2 Assignment: Data preparation and feature pipelines#

Scenario#

You are advising an analytics team choosing a predictive model for an operational decision. The stakeholders are: analytics lead, domain owner, operations manager, and model risk reviewer.

Task#

Answer the module question: How do preprocessing choices shape model behavior?

Use the module lab and course readings to produce: predictive modeling report with baseline comparison, validation evidence, and model card focused on data preparation and feature pipelines: Build a reproducible train/validation preprocessing pipeline..

Required Evidence#

Define the decision or system boundary in one paragraph.
Identify the dataset, proxy data, or evidence source you used: synthetic tabular observations with features, labels, train/test split, baseline score, and error slices.
Compare at least two alternatives, baselines, policies, or designs.
Report one quantitative result or structured scoring table.
Explain two failure modes and one mitigation for each.
State what additional evidence would be required before real deployment.

Submission#

Submit the completed notebook plus a 900-1200 word memo. The memo must include clear headings for context, method, evidence, risks, recommendation, and open questions.

# Assignment workspace for Module 2: Data preparation and feature pipelines
module = 2
decision = "How do preprocessing choices shape model behavior?"
artifact = "predictive modeling report with baseline comparison, validation evidence, and model card focused on data preparation and feature pipelines: Build a reproducible train/validation preprocessing pipeline."

alternatives = [
    {"option": "baseline_or_manual_process", "strength": "", "risk": "", "evidence": ""},
    {"option": "ai_assisted_or_advanced_option", "strength": "", "risk": "", "evidence": ""},
]

recommendation = {
    "decision": decision,
    "recommended_option": "",
    "minimum_evidence_before_pilot": [],
    "monitoring_metric": "",
    "rollback_trigger": "",
}

{"module": module, "artifact": artifact, "alternatives": alternatives, "recommendation": recommendation}

{'module': 2,
 'artifact': 'predictive modeling report with baseline comparison, validation evidence, and model card focused on data preparation and feature pipelines: Build a reproducible train/validation preprocessing pipeline.',
 'alternatives': [{'option': 'baseline_or_manual_process',
   'strength': '',
   'risk': '',
   'evidence': ''},
  {'option': 'ai_assisted_or_advanced_option',
   'strength': '',
   'risk': '',
   'evidence': ''}],
 'recommendation': {'decision': 'How do preprocessing choices shape model behavior?',
  'recommended_option': '',
  'minimum_evidence_before_pilot': [],
  'monitoring_metric': '',
  'rollback_trigger': ''}}

Acceptance Criteria#

Your submission is complete only if another reviewer can reproduce your reasoning from the evidence you provide. You do not need production-grade data, but you must be explicit about proxy-data limits and what would change with real institutional data.