The mission of Ought, the organization building Elicit, is to scale high-quality, open-ended reasoning. Language models present an incredible opportunity to help with this. 

However, we see significant risks as well. Many of these risks come from the fact that today's machine learning paradigm optimizes for outcomes more than process. As machine learning models become more powerful, they may fail in important ways if they've been trained to "do whatever it takes to score well on a metric." 

Scaling up high-quality reasoning requires caring about process, not just outcomes. Independent of AI risks, good research is good not just because it produced a good result. It's good also because it followed a process that others can inspect, replicate, and extend. 

Elicit is far from perfect and still makes way more mistakes than we want it to. But as people start to fear the impact of unconstrained tools like ChatGPT, we'd love to see more products prioritize process supervision. 

Here are some ways we're making it easy to check the work of language models in Elicit: 

1 When language models extract info from or answer questions about papers in Elicit, you can quickly see the source


This is possible because of how we produced that answer in the first place. First, we break the paper into small chunks. Then, we find the most relevant chunk. We give just that chunk to the language model to generate its answer. When you want to check the answer, we show you the chunk the language model saw. If you're finding it hard to answer questions about the paper, it might be because there are no good chunks that we have access to!

2 Elicit avoids collapsing complexity into scores
Rather than give you a score for how replicable or trustworthy a paper is, we show you common things researchers think about to determine trustworthiness. No score is 100% accurate for all applications. Especially for complex questions like "trustworthiness." So, in any one case, the score may mislead.

Eventually, we want to be able to customize this list of trustworthiness considerations for each user, question, and paper. 

You can see more examples and related thoughts in the Twitter version of these notes.

If you're itching to fuse machine learning researchers, software engineers, designers, and user researchers to build frontier interfaces like these, apply to our product manager role! Or send us people you think would be a great fit! 



You're receiving this email because you signed up for Elicit, an AI research assistant. To view this email in your browser or share it, use this link. You can see recent product announcements here. If you don't want to get these emails, you can update your preferences or unsubscribe from this list.