{"id":2382,"date":"2023-12-28T09:24:13","date_gmt":"2023-12-28T01:24:13","guid":{"rendered":"https:\/\/www.1ai.net\/?p=2382"},"modified":"2023-12-28T09:24:13","modified_gmt":"2023-12-28T01:24:13","slug":"%e5%be%ae%e8%bd%af%e6%8e%a8%e5%a4%a7%e6%a8%a1%e5%9e%8b%e6%95%b4%e5%90%88%e6%80%a7%e5%b7%a5%e5%85%b7%e5%ba%93promptbench","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/2382.html","title":{"rendered":"Microsoft launches PromptBench, a large model integration tool library"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%be%ae%e8%bd%af\" title=\"[View articles tagged with [Microsoft]]\" target=\"_blank\" >Microsoft<\/a>Recently, a dedicated<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%a7%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [large language model]]\" target=\"_blank\" >Large Language Model<\/a>The tool library provides a series of tools, including creating different types of prompts, loading datasets and models, and performing adversarial prompt attacks, to support researchers in evaluating and analyzing LLMs from different aspects.<\/p>\n<p class=\"article-content__img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2383\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2023\/12\/6383929150076561254910928.png\" alt=\"\" width=\"865\" height=\"713\" \/><\/p>\n<p>Project address:<a href=\"https:\/\/top.aibase.com\/tool\/promptbench\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/microsoft\/promptbench<\/a><\/p>\n<p>Paper address: https:\/\/arxiv.org\/abs\/2312.07910<\/p>\n<p>Key features and capabilities of PromptBench include:<\/p>\n<p>It supports multiple models and tasks, and can evaluate a variety of different large language models, such as GPT-4, as well as multiple tasks, such as sentiment analysis, grammar checking, etc.<\/p>\n<p>At the same time, different evaluation methods such as standard evaluation, dynamic evaluation and semantic evaluation are provided to comprehensively test the performance of the model. In addition, a variety of prompt engineering methods are implemented, such as thought chains of a small number of samples, emotional prompts, expert prompts, etc. A variety of adversarial testing methods are also integrated to detect the model&#039;s response and resistance to malicious input.<\/p>\n<p>It also includes analytical tools for interpreting evaluation results, such as visual analysis and word frequency analysis. Most importantly, PromptBench provides an interface that allows you to quickly build models, load datasets, and evaluate model performance. It can be installed and used with simple commands, making it easy for researchers to build and run evaluation pipelines.<\/p>\n<p>PromptBench supports a variety of data sets and models, including GLUE, MMLU, SQuAD V2, IWSLT2017, etc., and supports many models such as GPT-4, ChatGPT, etc. This series of features and functions makes PromptBench a very powerful and comprehensive evaluation tool library.<\/p>","protected":false},"excerpt":{"rendered":"<p>Microsoft has recently launched an integrated tool bank called PromptBench, dedicated to evaluating large language models. The repository provides a range of tools, including the creation of different types of tips, the loading of data sets and models, and the execution of adversarial alert attacks, to support researchers in their assessment and analysis of LLMs in different ways. Project address: https:\/\/github.com\/microsoft\/promptbench The main features and functions of https:\/\/arxiv.org\/abs\/2312.07910 PromptBench include: Supporting multiple models and missions, enabling the assessment of many different large-language models, such as GPS-4, and multiple missions<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[706,280],"collection":[],"class_list":["post-2382","post","type-post","status-publish","format-standard","hentry","category-news","tag-706","tag-280"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/2382","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=2382"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/2382\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=2382"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=2382"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=2382"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=2382"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}