Objective : The goal of this assignment was to uncover implicit biases in GPT-4o through a systematic red-teaming approach. Biases were explored by analyzing the model’s responses to prompts involving human roles and behaviors.
Summary of my explorations in algorithmic fairness, bias detection, and ethical considerations within machine learning systems, completed as part of the coursework for AI, Decision-Making, and Society at MIT CSAIL. These problem focus on applying theoretical fairness frameworks, developing practical evaluations, and implementing mitigation strategies to address societal and ethical challenges posed by AI.
The results revealed significant discrepancies in descriptions based on beverage preference:
This red-teaming approach demonstrated the utility of structured bias evaluations in LLMs. It emphasized the importance of uncovering implicit assumptions to ensure fair and representative outputs across diverse groups.
Here’s a complete set of tables summarizing each PSET in a clear, academic style. These tables provide structured insights into the objectives, methodologies, findings, and impacts for each problem set. You can include them in your document for clarity and conciseness.
Section | Details |
---|---|
Objective | Identify and analyze implicit biases in GPT-4o through red-teaming. |
Methodology |
- Developed comparative prompts evaluating
personality, work habits, and life purpose. - Analyzed latent stereotypes in LLM outputs. |
Findings |
- Coffee drinkers: Described as
"driven" and "energetic" but prone to burnout. - Tea drinkers: Described as "calm," "balanced," and reflective. |
Impact | Demonstrated structured red-teaming to detect latent stereotypes, informing fairness-aware AI design. |
Table : Red-Teaming Large Language Models for Bias Detection
Through this and other problems like these, I applied fairness frameworks, bias detection methods, and privacy-preserving strategies to evaluate and address ethical challenges in AI systems. These works demonstrate my ability to design rigorous evaluations, identify systemic biases, and implement solutions that align with societal values. Such methodologies are essential for building AI systems that are fair, responsible, and impactful.