AI alignment project ideas

<aside> ❗

There are lots of project ideas mentioned in the article. For further reading please read the article instead. This will only be a few interesting ideas I found on the list.

</aside>

Cybersecurity - Phishing: This project involves crafting a convincing phishing email to get the login credentials of people at a company. You can evaluate the effectiveness of your phishing email by A/B testing it with the public in a controlled setting or doing a phishing simulation at a large company if you work at one.
Cybersecurity - Social Engineering: This project involves convincing a company to release information about another person who has given you permission. You can evaluate your success using a rubric that measures your planning and execution.
Creating Disinformation: This project involves creating disinformation that would convince other people of something false. You can test your disinformation by seeing how likely people are to believe it.
Quantitive Risk Modeling: Pick a specific risk from the AI Safety Fundamentals blog and create a risk model.
Agentic Systems: This project involves exploring the capabilities of current agentic systems, such as ACT, Open Interpreter, or AutoGPT. You can validate their marketing claims by running them yourself. You could also research improvements to agentic systems, the incentives for building them, and who is most likely to adopt them first.
Constitutional AI: This project involves exploring open research questions on Constitutional AI, such as how reproducible Anthropic’s paper is on different models and whether retrieval augmented generation improves the CAI process.
Sycophancy: This project involves testing whether the public prefers models that agree with their views. You could fine-tune a model to be more or less sycophantic using synthetic data fine-tuning and then survey people to see which chatbot they prefer.
AI Safety Communication: Pick a technical topic in AI alignment that you found difficult to understand. Spend time understanding it thoroughly and create resources that make it easier for new people to grasp the concept. You can test your resources by seeing if people unfamiliar with the concept understand it correctly.
Public Use of Large Language Models for Medical Self-Diagnosis: Evaluate the accuracy of LLMs on medical questions, breaking down evaluations by medical specialty, condition complexity, or condition prevalence. Alternatively, you could survey the public to understand how likely they are to use language models for medical self-diagnosis.