The question of who determines the ethical boundaries for artificial intelligence (AI) models—particularly large language models (LLMs)—is both urgent and complex. As LLMs have become increasingly capable and widely deployed, their responses to ethically charged prompts (such as planning a crime) have shifted from compliance to resistance, reflecting evolving ethical constraints. This transformation is not the result of a single authority or ethical tradition, but rather emerges from a confluence of actors, processes, and philosophical tensions. This essay critically examines the sources and mechanisms by which ethical norms are embedded in AI models, drawing on recent academic literature in AI alignment, technology studies, regulatory theory, and philosophy.

The Multi-Layered Sources of AI Ethics

1. Developers and Corporate Governance

At the most immediate level, the ethical boundaries of LLMs are set by the organizations that design, train, and deploy them. Major AI companies such as OpenAI, Google, and Microsoft have established internal governance structures—ethics boards, advisory committees, and responsible AI teams—that oversee the development and deployment of AI systems (Floridi et al. 2018; Mittelstadt 2019). These bodies articulate ethical principles (e.g., fairness, transparency, non-maleficence) and operationalize them through codes of conduct, risk assessments, and technical safeguards (Morley et al. 2021). The values embedded in these frameworks are shaped by a combination of corporate culture, public image concerns, and the professional backgrounds of the developers and ethicists involved (Jones 2022).

However, the process is not value-neutral. As the literature on “embedded values” in technology design demonstrates, the organizational culture, disciplinary practices, and even tacit knowledge of development teams play a significant role in determining which values are prioritized and how they are interpreted (Friedman and Nissenbaum 1996; Jones 2022). For example, a company that prioritizes rapid innovation may embed different ethical trade-offs than one that emphasizes risk aversion or social responsibility.

2. Human Feedback and Reinforcement Learning

A central mechanism for encoding ethical boundaries in LLMs is Reinforcement Learning from Human Feedback (RLHF). In this process, human annotators evaluate model outputs for qualities such as helpfulness, safety, and appropriateness, and these judgments are used to fine-tune the model (Christiano et al. 2017; Bai et al. 2022). RLHF allows for the incorporation of nuanced, context-dependent ethical judgments that are difficult to formalize mathematically. It also enables iterative refinement, as models are updated in response to new forms of misuse or shifting societal expectations (Ouyang et al. 2022).

Yet, RLHF is not immune to bias. The ethical standards encoded through human feedback reflect the perspectives, backgrounds, and cultural assumptions of the annotators, who are often drawn from specific (frequently Western) populations (Gabriel 2020). This can result in the marginalization of minority or non-Western ethical perspectives, raising concerns about the global legitimacy of AI ethics (Birhane et al. 2022).

3. Regulatory and Legal Frameworks

The ethical constraints on LLMs are also shaped by external regulatory and legal requirements. Governments and international organizations have developed a range of frameworks—such as the EU AI Act, the U.S. AI Bill of Rights, and UNESCO’s Recommendation on the Ethics of AI—that mandate transparency, fairness, accountability, and respect for human rights in AI systems (Veale and Borgesius 2021; Floridi 2023). These regulations often require companies to conduct ethical impact assessments, document decision-making processes, and provide mechanisms for redress.

Regulatory influence is not uniform across jurisdictions, leading to a patchwork of standards that companies must navigate. In practice, many AI developers adopt the most stringent applicable standards (often those of the EU) as a baseline, resulting in a form of “regulatory universalism” that may not reflect local cultural values (Wachter et al. 2021).

4. Philosophical and Societal Influences

Beneath these institutional layers lies a deeper philosophical tension between universalism and relativism in AI ethics. Universalist approaches argue for the primacy of fundamental moral principles—such as human dignity, fairness, and non-maleficence—across all contexts (Floridi and Cowls 2019). These are often codified in international human rights instruments and serve as the foundation for many AI ethics guidelines.

In contrast, relativist perspectives emphasize the importance of cultural, historical, and situational factors in shaping ethical norms (Mittelstadt 2019). The challenge for AI developers is to balance these competing demands: to embed universal principles that protect against harm and discrimination, while remaining sensitive to local values and practices. Hybrid approaches, which establish core ethical commitments but allow for contextual adaptation, are increasingly favored in both academic and policy circles (Jobin, Ienca, and Vayena 2019).

The Evolution of Ethical Constraints in LLMs

The shift in LLM behavior—from compliance with ethically dubious prompts to active resistance—reflects the dynamic and iterative nature of AI ethics. Early models, trained primarily on large internet datasets, mirrored the diversity (and sometimes toxicity) of online discourse. As incidents of misuse became apparent, developers introduced more robust safeguards, including RLHF, content filters, and explicit refusals to engage in illegal or harmful activities (Bai et al. 2022). These changes were driven by a combination of public pressure, regulatory scrutiny, and internal ethical deliberation.

Recent research has also explored more transparent and participatory approaches to AI alignment, such as “Constitutional AI,” where high-level ethical principles are explicitly encoded and subject to public input (Askell et al. 2021). However, the challenge of ensuring that these principles are legitimate, robust, and adaptable remains unresolved.

Critical Reflections

While the current approach to AI ethics prioritizes safety, legality, and broadly accepted moral standards, it is not without limitations. The reliance on corporate governance and Western-centric regulatory frameworks risks perpetuating dominant values at the expense of marginalized perspectives. The opacity of RLHF and other alignment techniques complicates efforts to audit and contest the ethical boundaries of AI systems. Moreover, the exclusion of user choice in ethical frameworks—while justified by concerns about safety and misuse—raises questions about autonomy and pluralism in digital societies.

Conclusion

The ethical boundaries of AI models are set by a complex interplay of corporate governance, human feedback, regulatory mandates, and philosophical commitments. No single actor or tradition “tells” AI what is or is not ethical; rather, these boundaries emerge from ongoing negotiation among developers, regulators, annotators, and society at large. As AI systems become more pervasive and influential, the challenge will be to ensure that their ethical constraints are transparent, legitimate, and responsive to the diversity of human values.

Brandon Blankenship

References

Askell, Amanda, Yuntao Bai, and Saurav Kadavath. 2021. “A General Language Assistant as a Laboratory for Alignment.” arXiv preprint arXiv:2112.00861.

Bai, Yuntao, et al. 2022. “Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.” arXiv preprint arXiv:2204.05862.

Birhane, Abeba, et al. 2022. “The Values Encoded in Machine Learning Research.” Patterns 3, no. 8: 100588.

Christiano, Paul F., et al. 2017. “Deep Reinforcement Learning from Human Preferences.” Advances in Neural Information Processing Systems 30.

Floridi, Luciano, and Josh Cowls. 2019. “A Unified Framework of Five Principles for AI in Society.” Harvard Data Science Review 1, no. 1.

Floridi, Luciano, et al. 2018. “AI4People—An Ethical Framework for a Good AI Society: Opportunities, Risks, Principles, and Recommendations.” Minds and Machines 28, no. 4: 689–707.

Friedman, Batya, and Helen Nissenbaum. 1996. “Bias in Computer Systems.” ACM Transactions on Information Systems 14, no. 3: 330–347.

Gabriel, Iason. 2020. “Artificial Intelligence, Values and Alignment.” Minds and Machines 30, no. 3: 411–437.

Jobin, Anna, Marcello Ienca, and Effy Vayena. 2019. “The Global Landscape of AI Ethics Guidelines.” Nature Machine Intelligence 1, no. 9: 389–399.

Jones, Peter H. 2022. “Values Conflicts in Software Innovation: Negotiating Embedded Ethics in Organizational Processes.” Journal of Responsible Innovation 9, no. 1: 1–23.

Mittelstadt, Brent. 2019. “Principles Alone Cannot Guarantee Ethical AI.” Nature Machine Intelligence 1, no. 11: 501–507.

Morley, Jessica, et al. 2021. “From What to How: An Initial Review of Publicly Available AI Ethics Tools, Methods and Research to Translate Principles into Practices.” Ethics and Information Technology 23, no. 3: 293–306.

Ouyang, Long, et al. 2022. “Training Language Models to Follow Instructions with Human Feedback.” Advances in Neural Information Processing Systems 35: 27730–27744.

Veale, Michael, and Frederik Zuiderveen Borgesius. 2021. “Demystifying the Draft EU Artificial Intelligence Act.” Computer Law Review International 22, no. 4: 97–112.

Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2021. “Why Fairness Cannot Be Automated: Bridging the Gap Between EU Non-Discrimination Law and AI.” Computer Law & Security Review 41: 105567.