{"id":22306,"date":"2024-10-31T08:46:02","date_gmt":"2024-10-31T00:46:02","guid":{"rendered":"https:\/\/www.1ai.net\/?p=22306"},"modified":"2024-10-31T08:46:02","modified_gmt":"2024-10-31T00:46:02","slug":"openai-%e5%bc%80%e6%ba%90-simpleqa-%e6%96%b0%e5%9f%ba%e5%87%86%ef%bc%8c%e4%b8%93%e6%b2%bb%e5%a4%a7%e6%a8%a1%e5%9e%8b%e8%83%a1%e8%a8%80%e4%b9%b1%e8%af%ad","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/22306.html","title":{"rendered":"OpenAI Opens New SimpleQA Benchmark to Cure Big Models of \"Nonsense\""},"content":{"rendered":"<p>October 31, 2010 - On October 30th, local time.<a href=\"https:\/\/www.1ai.net\/en\/tag\/openai\" title=\"[View articles tagged with [OpenAI]]\" target=\"_blank\" >OpenAI<\/a> announced that in order to measure<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [language model]]\" target=\"_blank\" >Language Model<\/a>The accuracy of the<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>an organization called <a href=\"https:\/\/www.1ai.net\/en\/tag\/simpleqa\" title=\"_Other Organiser\" target=\"_blank\" >SimpleQA<\/a> A new benchmark that measures the ability of language models to answer short fact-seeking questions.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-22307\" title=\"75badecbj00sm73ff001qd000s500l4p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/10\/75badecbj00sm73ff001qd000s500l4p.jpg\" alt=\"75badecbj00sm73ff001qd000s500l4p\" width=\"1013\" height=\"760\" \/><\/p>\n<blockquote>\n<ul>\n<li>One of the open challenges in AI is how to train models to generate<strong>factually correct<\/strong>The Answer. Current language models sometimes<strong>Produces incorrect output or unverified answers<\/strong>This question is referred to as an \"illusion\". Language models that generate more accurate and less hallucinatory responses are more reliable and can be used in a wider range of applications.<\/li>\n<\/ul>\n<\/blockquote>\n<p>OpenAI states that the goal is to use SimpleQA to create a dataset with the following characteristics:<\/p>\n<ul>\n<li><strong>High correctness:<\/strong>Reference answers to questions are verified by two independent AI trainers to ensure fairness in scoring.<\/li>\n<li><strong>Diversity:<\/strong>SimpleQA covers a wide range of topics, from science and technology to TV shows and video games.<\/li>\n<li><strong>Cutting edge challenging:<\/strong>Compared to earlier benchmarks such as TriviaQA (2017) or NQ (2019), SimpleQA is more challenging, especially for frontier models such as GPT-4o (e.g., GPT-4o scored less than 40%).<\/li>\n<li><strong>Efficient User Experience:<\/strong>SimpleQA questions and answers are concise and clear, allowing for fast and efficient operation and quick scoring via OpenAI APIs and more. In addition, SimpleQA with 4326 questions should have low variance in the assessment.<\/li>\n<\/ul>\n<p>SimpleQA will be a<strong>Simple but challenging<\/strong>benchmark for evaluating the factual accuracy of frontier models.The main limitation of SimpleQA is its scope - although SimpleQA is accurate, it only measures factual accuracy in the constrained setting of short queries that are fact-oriented and have a verifiable answer.<\/p>\n<p>OpenAI says that whether the facticity exhibited by the model in short answers is related to its<strong>Performance in long, multi-factual content<\/strong>Related, this is still a<strong>hanging in the balance<\/strong>It is also a research topic of SimpleQA. It is hoped that SimpleQA's open source will further advance the development of AI research and make models more credible and reliable.<\/p>\n<p data-vmark=\"b5a4\"><span class=\"referenceTitle\">With relevant addresses:<\/span><\/p>\n<ul class=\"custom_reference list-paddingleft-1\">\n<li class=\"list-undefined list-reference-paddingleft\">\n<p data-vmark=\"2693\">Open Source Links:<span class=\"link-text-start-with-http\">https:\/\/github.com\/openai\/simple-evals\/<\/span><\/p>\n<\/li>\n<li class=\"list-undefined list-reference-paddingleft\">\n<p data-vmark=\"8320\">Thesis:<span class=\"link-text-start-with-http\">https:\/\/cdn.openai.com\/papers\/simpleqa.pdf<\/span><\/p>\n<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>On October 31, OpenAI announced that it is open-sourcing a new benchmark called SimpleQA, which measures the ability of language models to answer short fact-seeking questions, in order to measure the accuracy of language models. One of the open challenges in AI is how to train models to generate factually correct answers. Current language models sometimes produce incorrect output or unverified answers, a problem known as \"hallucination\". Language models that produce more accurate, less illusory answers are more reliable and can be used in a wider range of applications. According to OpenAI, the goal is to use SimpleQA to create a dataset with the following characteristics<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[190,4794,219,1144],"collection":[],"class_list":["post-22306","post","type-post","status-publish","format-standard","hentry","category-news","tag-openai","tag-simpleqa","tag-219","tag-1144"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/22306","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=22306"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/22306\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=22306"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=22306"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=22306"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=22306"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}