Weak-to-strong generalizationbruno14 de dezembro de 2023Sem categoria We present a new research direction for superalignment, together with promising initial results: can we leverage the generalization properties of deep learning to control strong models with weak supervisors?