New Blog by Lilian Weng, Former VP of Security at OpenAI: Why We Think

Lilian Weng analyzes the importance of "thinking time" for large models, and argues that the performance of models on complex tasks can be significantly improved by increasing the computation (e.g., chaining of thoughts, pause marking, etc.) during testing; there are two main strategies for model "thinking": parallel sampling (generating multiple outputs at the same time) and sequence revision (iterative revision based on the previous round's outputs). Currently, there are two main strategies for model "thinking": parallel sampling (generating multiple outputs at the same time) and sequence revision (iterative revision based on the previous round of outputs), but in practice, we need to balance the thinking time and computational cost; it is found that optimizing the chain of thinking through reinforcement learning may lead to the problem of reward hacking, where the model hides its true intentions in the chain of thinking, which needs to be solved in the future research.

Search