A new study confirms AI safety can fail from a single prompt change—revealing causal flaws in guardrails and the future of alignment.