{"id":18610,"date":"2024-08-25T09:25:05","date_gmt":"2024-08-25T01:25:05","guid":{"rendered":"https:\/\/www.1ai.net\/?p=18610"},"modified":"2024-08-25T09:25:05","modified_gmt":"2024-08-25T01:25:05","slug":"meta-%e5%8f%91%e5%b8%83-sapiens-%e8%a7%86%e8%a7%89%e6%a8%a1%e5%9e%8b%ef%bc%8c%e8%ae%a9-ai-%e5%88%86%e6%9e%90%e5%92%8c%e7%90%86%e8%a7%a3%e5%9b%be%e7%89%87-%e8%a7%86%e9%a2%91%e4%b8%ad%e4%ba%ba","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/18610.html","title":{"rendered":"Meta releases Sapiens visual model to enable AI to analyze and understand human actions in images\/videos"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/meta\" title=\"[View articles tagged with [Meta]]\" target=\"_blank\" >Meta<\/a> Reality Labs has recently launched a new <a href=\"https:\/\/www.1ai.net\/en\/tag\/sapiens\" title=\"[See articles with [Sapiens] labels]\" target=\"_blank\" >Sapiens<\/a> of <a href=\"https:\/\/www.1ai.net\/en\/tag\/ai%e8%a7%86%e8%a7%89%e6%a8%a1%e5%9e%8b\" title=\"_OTHER ORGANISER\" target=\"_blank\" >AI Vision Model<\/a>, applicable to four basic human-centric vision tasks: 2D pose estimation, body part segmentation, depth estimation, and surface normal prediction.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18611\" title=\"d75a09adj00sir2ko000ud000k000b8m\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/08\/d75a09adj00sir2ko000ud000k000b8m.jpg\" alt=\"d75a09adj00sir2ko000ud000k000b8m\" width=\"720\" height=\"404\" \/><\/p>\n<p>The number of parameters of these models varies from 300 million to 2 billion. They adopt the visual transformer architecture, where tasks share the same encoder but each task has a different decoder head.<\/p>\n<ul>\n<li><strong>2D Pose Estimation:<\/strong>This task involves detecting and localizing key points of a human body in a 2D image. These key points usually correspond to joints such as elbows, knees, and shoulders, and help understand a person\u2019s posture and movements.<\/li>\n<li><strong>Body Part Segmentation:<\/strong>This task segments an image into different body parts, such as head, torso, arms, and legs. Each pixel in the image is classified as belonging to a specific body part, which is useful for applications such as virtual try-on and medical imaging.<\/li>\n<li><strong>Depth Estimation:<\/strong>The task is to estimate the distance of each pixel in the image from the camera, effectively generating a 3D image from a 2D image. This is crucial for applications such as augmented reality and autonomous driving, where understanding the layout of a space is important.<\/li>\n<li><strong>Surface Normal Prediction:<\/strong>The task is to predict the orientation of surfaces in an image. Each pixel is assigned a normal vector that indicates which direction the surface is facing. This information is very valuable for 3D reconstruction and understanding the geometry of objects in the scene.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18612\" title=\"4f61108ej00sir2kp001od000ms00a7m\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/08\/4f61108ej00sir2kp001od000ms00a7m.jpg\" alt=\"4f61108ej00sir2kp001od000ms00a7m\" width=\"820\" height=\"367\" \/><\/p>\n<p>Meta said the model can natively support 1K high-resolution inference and is very easy to adjust for individual tasks, simply by pre-training the model on more than 300 million wild human images.<\/p>\n<p>Even when labeled data is scarce or entirely synthetic, the generated models can show excellent generalization capabilities to in-the-wild data.<\/p>","protected":false},"excerpt":{"rendered":"<p>Meta Reality Labs has recently introduced AI vision models called Sapiens for 4 basic human-centered vision tasks: 2D pose prediction, body part segmentation, depth estimation, and surface normal prediction. These models vary in the number of parameters, from 300 million to 2 billion. They use a vision converter architecture where the tasks share the same encoder and each task has a different decoder head. 2D Pose Prediction: this task consists of detecting and localizing key points of the human body in a 2D image. These key points usually correspond to joints such as elbows, knees, and shoulders and help in understanding human posture and movement. Body Part Segmentation: this task segments the image into different body parts such as head, torso, arms and legs. Each of the image's<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[4131,297,4132],"collection":[],"class_list":["post-18610","post","type-post","status-publish","format-standard","hentry","category-news","tag-ai","tag-meta","tag-sapiens"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/18610","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=18610"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/18610\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=18610"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=18610"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=18610"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=18610"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}