Module 2 Assignment: Data preparation and feature pipelines

Module 2 Assignment: Data preparation and feature pipelines#

Scenario#

You are advising an analytics team choosing a predictive model for an operational decision. The stakeholders are: analytics lead, domain owner, operations manager, and model risk reviewer.

Task#

Answer the module question: How do preprocessing choices shape model behavior?

Use the module lab and course readings to produce: predictive modeling report with baseline comparison, validation evidence, and model card focused on data preparation and feature pipelines: Build a reproducible train/validation preprocessing pipeline..

Required Evidence#

  • Define the decision or system boundary in one paragraph.

  • Identify the dataset, proxy data, or evidence source you used: synthetic tabular observations with features, labels, train/test split, baseline score, and error slices.

  • Compare at least two alternatives, baselines, policies, or designs.

  • Report one quantitative result or structured scoring table.

  • Explain two failure modes and one mitigation for each.

  • State what additional evidence would be required before real deployment.

Submission#

Submit the completed notebook plus a 900-1200 word memo. The memo must include clear headings for context, method, evidence, risks, recommendation, and open questions.

# Assignment workspace for Module 2: Data preparation and feature pipelines
module = 2
decision = "How do preprocessing choices shape model behavior?"
artifact = "predictive modeling report with baseline comparison, validation evidence, and model card focused on data preparation and feature pipelines: Build a reproducible train/validation preprocessing pipeline."

alternatives = [
    {"option": "baseline_or_manual_process", "strength": "", "risk": "", "evidence": ""},
    {"option": "ai_assisted_or_advanced_option", "strength": "", "risk": "", "evidence": ""},
]

recommendation = {
    "decision": decision,
    "recommended_option": "",
    "minimum_evidence_before_pilot": [],
    "monitoring_metric": "",
    "rollback_trigger": "",
}

{"module": module, "artifact": artifact, "alternatives": alternatives, "recommendation": recommendation}
{'module': 2,
 'artifact': 'predictive modeling report with baseline comparison, validation evidence, and model card focused on data preparation and feature pipelines: Build a reproducible train/validation preprocessing pipeline.',
 'alternatives': [{'option': 'baseline_or_manual_process',
   'strength': '',
   'risk': '',
   'evidence': ''},
  {'option': 'ai_assisted_or_advanced_option',
   'strength': '',
   'risk': '',
   'evidence': ''}],
 'recommendation': {'decision': 'How do preprocessing choices shape model behavior?',
  'recommended_option': '',
  'minimum_evidence_before_pilot': [],
  'monitoring_metric': '',
  'rollback_trigger': ''}}

Acceptance Criteria#

Your submission is complete only if another reviewer can reproduce your reasoning from the evidence you provide. You do not need production-grade data, but you must be explicit about proxy-data limits and what would change with real institutional data.