{"id":57630,"date":"2026-04-27T09:09:33","date_gmt":"2026-04-27T07:09:33","guid":{"rendered":"https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/"},"modified":"2026-04-27T09:09:33","modified_gmt":"2026-04-27T07:09:33","slug":"tri-modal-fusion-transformers-for-uav-based-object-detection","status":"publish","type":"post","link":"https:\/\/www.nae.fr\/en\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/","title":{"rendered":"Tri-Modal Fusion Transformers for UAV-based Object Detection"},"content":{"rendered":"<blockquote>\n<div class=\"info-article\">\n<div class=\"title-hat pl-0\">\n<div class=\"info-article\">\n<div class=\"title-hat pl-0\">\n<div class=\"row mx-0\">\n<div class=\"info-article\">\n<div class=\"title-hat pl-0\">\n<div class=\"info-article\">\n<div class=\"title-hat pl-0\">\n<div class=\"info-article\">\n<div class=\"title-hat pl-0\">\n<div class=\"info-article\">\n<div class=\"title-hat pl-0\">\n<div class=\"ExpressionSummary svelte-ccn03w\">\n<div class=\"row mx-0\">\n<div class=\"chapo\">\n<div class=\"mb-4\">\n<div class=\"chapo\">\n<div class=\"post__live-container--answer\">\n<div class=\"summary\">\n<div class=\"summary\">\n<p data-iceapw=\"35\" data-iceapc=\"1\">Reliable UAV object detection requires robustness to illumination changes, motion blur, and scene dynamics that suppress RGB cues. Thermal long-wave infrared (LWIR) sensing preserves contrast in low light, and event cameras retain microsecond-level temporal edges, but integrating all three modalities in a unified detector has not been systematically studied. We present a tri-modal framework that processes RGB, thermal, and event data with a dual-stream hierarchical vision transformer. At selected encoder depths, a Modality-Aware Gated Exchange (MAGE) applies inter-sensor channel and spatial gating, and a Bidirectional Token Exchange (BiTE) module performs bidirectional token-level attention with depthwise-pointwise refinement, producing resolution-preserving fused maps for a standard feature pyramid and two-stage detector.<\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div><\/blockquote>\n<div class=\"info-article\">\n<div class=\"title-hat pl-0\">\n<div class=\"info-article\">\n<div class=\"title-hat pl-0\">\n<div class=\"row mx-0\">\n<div class=\"info-article\">\n<div class=\"title-hat pl-0\">\n<div class=\"info-article\">\n<div class=\"title-hat pl-0\">\n<div class=\"info-article\">\n<div class=\"title-hat pl-0\">\n<div class=\"info-article\">\n<div class=\"title-hat pl-0\">\n\nPour en savoir plus : <a href=\"https:\/\/arxiv.org\/abs\/2604.16630\" target=\"_blank\" rel=\"noopener\">Tri-Modal Fusion Transformers for UAV-based Object Detection<\/a>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Reliable UAV object detection requires robustness to illumination changes, motion blur, and scene dynamics that suppress RGB cues. Thermal long-wave infrared (LWIR) sensing preserves contrast in low light, and event cameras retain microsecond-level temporal edges, but integrating all three modalities in a unified detector has not been systematically studied. We present a tri-modal framework that [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":56493,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[34,16],"tags":[35,44,33],"class_list":["post-57630","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-innovation-et-technologique","category-rti","tag-actualites","tag-developpement-des-systemes-intelligents","tag-drones"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Tri-Modal Fusion Transformers for UAV-based Object Detection - NAE<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.nae.fr\/en\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Tri-Modal Fusion Transformers for UAV-based Object Detection - NAE\" \/>\n<meta property=\"og:description\" content=\"Reliable UAV object detection requires robustness to illumination changes, motion blur, and scene dynamics that suppress RGB cues. Thermal long-wave infrared (LWIR) sensing preserves contrast in low light, and event cameras retain microsecond-level temporal edges, but integrating all three modalities in a unified detector has not been systematically studied. We present a tri-modal framework that [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.nae.fr\/en\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/\" \/>\n<meta property=\"og:site_name\" content=\"NAE\" \/>\n<meta property=\"article:published_time\" content=\"2026-04-27T07:09:33+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.nae.fr\/wp-content\/uploads\/2026\/06\/logo-cornell-university.png\" \/>\n\t<meta property=\"og:image:width\" content=\"225\" \/>\n\t<meta property=\"og:image:height\" content=\"225\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"adminwa\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"adminwa\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/\"},\"author\":{\"name\":\"adminwa\",\"@id\":\"https:\\\/\\\/www.nae.fr\\\/#\\\/schema\\\/person\\\/3d658e930f01449b7195ce4a78fcfc1e\"},\"headline\":\"Tri-Modal Fusion Transformers for UAV-based Object Detection\",\"datePublished\":\"2026-04-27T07:09:33+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/\"},\"wordCount\":126,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.nae.fr\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.nae.fr\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/logo-cornell-university.png\",\"keywords\":[\"Actualit\u00e9s\",\"D\u00e9veloppement des syst\u00e8mes intelligents\",\"Drones\"],\"articleSection\":[\"Innovation et technologique\",\"RTI\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/\",\"url\":\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/\",\"name\":\"Tri-Modal Fusion Transformers for UAV-based Object Detection - NAE\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.nae.fr\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.nae.fr\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/logo-cornell-university.png\",\"datePublished\":\"2026-04-27T07:09:33+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.nae.fr\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/logo-cornell-university.png\",\"contentUrl\":\"https:\\\/\\\/www.nae.fr\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/logo-cornell-university.png\",\"width\":225,\"height\":225},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.nae.fr\\\/2026\\\/04\\\/27\\\/tri-modal-fusion-transformers-for-uav-based-object-detection\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/www.nae.fr\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Tri-Modal Fusion Transformers for UAV-based Object Detection\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.nae.fr\\\/#website\",\"url\":\"https:\\\/\\\/www.nae.fr\\\/\",\"name\":\"NAE\",\"description\":\"NAE fili\u00e8re d&#039;excellence...\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.nae.fr\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.nae.fr\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.nae.fr\\\/#organization\",\"name\":\"NAE\",\"url\":\"https:\\\/\\\/www.nae.fr\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.nae.fr\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.nae.fr\\\/wp-content\\\/uploads\\\/2025\\\/10\\\/nae-logo.svg\",\"contentUrl\":\"https:\\\/\\\/www.nae.fr\\\/wp-content\\\/uploads\\\/2025\\\/10\\\/nae-logo.svg\",\"width\":84,\"height\":52,\"caption\":\"NAE\"},\"image\":{\"@id\":\"https:\\\/\\\/www.nae.fr\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.nae.fr\\\/#\\\/schema\\\/person\\\/3d658e930f01449b7195ce4a78fcfc1e\",\"name\":\"adminwa\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5118570d863e9bebccd6a13a0e571e5515c30a2f455e20ed92788cb2b4e5c631?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5118570d863e9bebccd6a13a0e571e5515c30a2f455e20ed92788cb2b4e5c631?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5118570d863e9bebccd6a13a0e571e5515c30a2f455e20ed92788cb2b4e5c631?s=96&d=mm&r=g\",\"caption\":\"adminwa\"},\"sameAs\":[\"https:\\\/\\\/www.nae.fr\"],\"url\":\"https:\\\/\\\/www.nae.fr\\\/en\\\/author\\\/adminwa\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Tri-Modal Fusion Transformers for UAV-based Object Detection - NAE","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.nae.fr\/en\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/","og_locale":"en_US","og_type":"article","og_title":"Tri-Modal Fusion Transformers for UAV-based Object Detection - NAE","og_description":"Reliable UAV object detection requires robustness to illumination changes, motion blur, and scene dynamics that suppress RGB cues. Thermal long-wave infrared (LWIR) sensing preserves contrast in low light, and event cameras retain microsecond-level temporal edges, but integrating all three modalities in a unified detector has not been systematically studied. We present a tri-modal framework that [&hellip;]","og_url":"https:\/\/www.nae.fr\/en\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/","og_site_name":"NAE","article_published_time":"2026-04-27T07:09:33+00:00","og_image":[{"width":225,"height":225,"url":"https:\/\/www.nae.fr\/wp-content\/uploads\/2026\/06\/logo-cornell-university.png","type":"image\/png"}],"author":"adminwa","twitter_card":"summary_large_image","twitter_misc":{"Written by":"adminwa","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/#article","isPartOf":{"@id":"https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/"},"author":{"name":"adminwa","@id":"https:\/\/www.nae.fr\/#\/schema\/person\/3d658e930f01449b7195ce4a78fcfc1e"},"headline":"Tri-Modal Fusion Transformers for UAV-based Object Detection","datePublished":"2026-04-27T07:09:33+00:00","mainEntityOfPage":{"@id":"https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/"},"wordCount":126,"commentCount":0,"publisher":{"@id":"https:\/\/www.nae.fr\/#organization"},"image":{"@id":"https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/#primaryimage"},"thumbnailUrl":"https:\/\/www.nae.fr\/wp-content\/uploads\/2026\/06\/logo-cornell-university.png","keywords":["Actualit\u00e9s","D\u00e9veloppement des syst\u00e8mes intelligents","Drones"],"articleSection":["Innovation et technologique","RTI"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/","url":"https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/","name":"Tri-Modal Fusion Transformers for UAV-based Object Detection - NAE","isPartOf":{"@id":"https:\/\/www.nae.fr\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/#primaryimage"},"image":{"@id":"https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/#primaryimage"},"thumbnailUrl":"https:\/\/www.nae.fr\/wp-content\/uploads\/2026\/06\/logo-cornell-university.png","datePublished":"2026-04-27T07:09:33+00:00","breadcrumb":{"@id":"https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/#primaryimage","url":"https:\/\/www.nae.fr\/wp-content\/uploads\/2026\/06\/logo-cornell-university.png","contentUrl":"https:\/\/www.nae.fr\/wp-content\/uploads\/2026\/06\/logo-cornell-university.png","width":225,"height":225},{"@type":"BreadcrumbList","@id":"https:\/\/www.nae.fr\/2026\/04\/27\/tri-modal-fusion-transformers-for-uav-based-object-detection\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.nae.fr\/"},{"@type":"ListItem","position":2,"name":"Tri-Modal Fusion Transformers for UAV-based Object Detection"}]},{"@type":"WebSite","@id":"https:\/\/www.nae.fr\/#website","url":"https:\/\/www.nae.fr\/","name":"NAE","description":"NAE fili\u00e8re d&#039;excellence...","publisher":{"@id":"https:\/\/www.nae.fr\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.nae.fr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.nae.fr\/#organization","name":"NAE","url":"https:\/\/www.nae.fr\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.nae.fr\/#\/schema\/logo\/image\/","url":"https:\/\/www.nae.fr\/wp-content\/uploads\/2025\/10\/nae-logo.svg","contentUrl":"https:\/\/www.nae.fr\/wp-content\/uploads\/2025\/10\/nae-logo.svg","width":84,"height":52,"caption":"NAE"},"image":{"@id":"https:\/\/www.nae.fr\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.nae.fr\/#\/schema\/person\/3d658e930f01449b7195ce4a78fcfc1e","name":"adminwa","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/5118570d863e9bebccd6a13a0e571e5515c30a2f455e20ed92788cb2b4e5c631?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/5118570d863e9bebccd6a13a0e571e5515c30a2f455e20ed92788cb2b4e5c631?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5118570d863e9bebccd6a13a0e571e5515c30a2f455e20ed92788cb2b4e5c631?s=96&d=mm&r=g","caption":"adminwa"},"sameAs":["https:\/\/www.nae.fr"],"url":"https:\/\/www.nae.fr\/en\/author\/adminwa\/"}]}},"_links":{"self":[{"href":"https:\/\/www.nae.fr\/en\/wp-json\/wp\/v2\/posts\/57630","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nae.fr\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nae.fr\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nae.fr\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nae.fr\/en\/wp-json\/wp\/v2\/comments?post=57630"}],"version-history":[{"count":0,"href":"https:\/\/www.nae.fr\/en\/wp-json\/wp\/v2\/posts\/57630\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.nae.fr\/en\/wp-json\/wp\/v2\/media\/56493"}],"wp:attachment":[{"href":"https:\/\/www.nae.fr\/en\/wp-json\/wp\/v2\/media?parent=57630"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nae.fr\/en\/wp-json\/wp\/v2\/categories?post=57630"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nae.fr\/en\/wp-json\/wp\/v2\/tags?post=57630"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}