Lilian Weng analyzes the importance of "thinking time" for large models, and argues that the performance of models on complex tasks can be significantly improved by increasing the computation (e.g., chaining of thoughts, pause marking, etc.) during testing; there are two main strategies for model "thinking": parallel sampling (generating multiple outputs at the same time) and sequence revision (iterative revision based on the previous round's outputs). Currently, there are two main strategies for model "thinking": parallel sampling (generating multiple outputs at the same time) and sequence revision (iterative revision based on the previous round of outputs), but in practice, we need to balance the thinking time and computational cost; it is found that optimizing the chain of thinking through reinforcement learning may lead to the problem of reward hacking, where the model hides its true intentions in the chain of thinking, which needs to be solved in the future research.
❯
Search
Scan to open current page
Top
Checking in, please wait
Click for today's check-in bonus!
You have earned {{mission.data.mission.credit}} points today!
My Coupons
-
¥CouponsLimitation of useExpired and UnavailableLimitation of use
before
Limitation of usePermanently validCoupon ID:×Available for the following products: Available for the following products categories: Unrestricted use:Available for all products and product types
No coupons available!
Unverify
Daily tasks completed:
