Transformers

2023-03-08 14:29 更新

為 Jax、PyTorch 和 TensorFlow 打造的先進(jìn)的自然語言處理

 Transformers 提供了數(shù)以千計(jì)的預(yù)訓(xùn)練模型,支持 100 多種語言的文本分類、信息抽取、問答、摘要、翻譯、文本生成。它的宗旨是讓最先進(jìn)的 NLP 技術(shù)人人易用。

 Transformers 提供了便于快速下載和使用的API,讓你可以把預(yù)訓(xùn)練模型用在給定文本、在你的數(shù)據(jù)集上微調(diào)然后通過 model hub 與社區(qū)共享。同時(shí),每個(gè)定義的 Python 模塊均完全獨(dú)立,方便修改和快速研究實(shí)驗(yàn)。

 Transformers 支持三個(gè)最熱門的深度學(xué)習(xí)庫: JaxPyTorch 以及 TensorFlow — 并與之無縫整合。你可以直接使用一個(gè)框架訓(xùn)練你的模型然后用另一個(gè)加載和推理。

在線演示

你可以直接在模型頁面上測試大多數(shù) model hub 上的模型。 我們也提供了 私有模型托管、模型版本管理以及推理API。

這里是一些例子:

Write With Transformer,由抱抱臉團(tuán)隊(duì)打造,是一個(gè)文本生成的官方 demo。

如果你在尋找由抱抱臉團(tuán)隊(duì)提供的定制化支持服務(wù)

HuggingFace Expert Acceleration Program

快速上手

我們?yōu)榭焖偈褂媚P吞峁┝?nbsp;pipeline (流水線)API。流水線聚合了預(yù)訓(xùn)練模型和對應(yīng)的文本預(yù)處理。下面是一個(gè)快速使用流水線去判斷正負(fù)面情緒的例子:

>>> from transformers import pipeline

# 使用情緒分析流水線
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]

第二行代碼下載并緩存了流水線使用的預(yù)訓(xùn)練模型,而第三行代碼則在給定的文本上進(jìn)行了評估。這里的答案“正面” (positive) 具有 99 的置信度。

許多的 NLP 任務(wù)都有開箱即用的預(yù)訓(xùn)練流水線。比如說,我們可以輕松的從給定文本中抽取問題答案:

>>> from transformers import pipeline

# 使用問答流水線
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
...     'question': 'What is the name of the repository ?',
...     'context': 'Pipeline has been included in the huggingface/transformers repository'
... })
{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'}

除了給出答案,預(yù)訓(xùn)練模型還給出了對應(yīng)的置信度分?jǐn)?shù)、答案在詞符化 (tokenized) 后的文本中開始和結(jié)束的位置。你可以從這個(gè)教程了解更多流水線API支持的任務(wù)。

要在你的任務(wù)上下載和使用任意預(yù)訓(xùn)練模型也很簡單,只需三行代碼。這里是 PyTorch 版的示例:

>>> from transformers import AutoTokenizer, AutoModel

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = AutoModel.from_pretrained("bert-base-uncased")

>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)

這里是等效的 TensorFlow 代碼:

>>> from transformers import AutoTokenizer, TFAutoModel

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("bert-base-uncased")

>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)

詞符化器 (tokenizer) 為所有的預(yù)訓(xùn)練模型提供了預(yù)處理,并可以直接對單個(gè)字符串進(jìn)行調(diào)用(比如上面的例子)或?qū)α斜?(list) 調(diào)用。它會輸出一個(gè)你可以在下游代碼里使用或直接通過 ** 解包表達(dá)式傳給模型的詞典 (dict)。

模型本身是一個(gè)常規(guī)的 Pytorch nn.Module 或 TensorFlow tf.keras.Model(取決于你的后端),可以常規(guī)方式使用。 這個(gè)教程解釋了如何將這樣的模型整合到經(jīng)典的 PyTorch 或 TensorFlow 訓(xùn)練循環(huán)中,或是如何使用我們的 Trainer 訓(xùn)練器)API 來在一個(gè)新的數(shù)據(jù)集上快速微調(diào)。

為什么要用 transformers?

  1. 便于使用的先進(jìn)模型:NLU 和 NLG 上表現(xiàn)優(yōu)越對教學(xué)和實(shí)踐友好且低門檻高級抽象,只需了解三個(gè)類對所有模型統(tǒng)一的API
  2. 更低計(jì)算開銷,更少的碳排放:研究人員可以分享已訓(xùn)練的模型而非每次從頭開始訓(xùn)練工程師可以減少計(jì)算用時(shí)和生產(chǎn)環(huán)境開銷數(shù)十種模型架構(gòu)、兩千多個(gè)預(yù)訓(xùn)練模型、100多種語言支持
  3. 對于模型生命周期的每一個(gè)部分都面面俱到:訓(xùn)練先進(jìn)的模型,只需 3 行代碼模型在不同深度學(xué)習(xí)框架間任意轉(zhuǎn)移,隨你心意為訓(xùn)練、評估和生產(chǎn)選擇最適合的框架,銜接無縫
  4. 為你的需求輕松定制專屬模型和用例:我們?yōu)槊糠N模型架構(gòu)提供了多個(gè)用例來復(fù)現(xiàn)原論文結(jié)果模型內(nèi)部結(jié)構(gòu)保持透明一致模型文件可單獨(dú)使用,方便魔改和快速實(shí)驗(yàn)

什么情況下我不該用 transformers?

  • 本庫并不是模塊化的神經(jīng)網(wǎng)絡(luò)工具箱。模型文件中的代碼特意呈若璞玉,未經(jīng)額外抽象封裝,以便研究人員快速迭代魔改而不致溺于抽象和文件跳轉(zhuǎn)之中。
  • Trainer API 并非兼容任何模型,只為本庫之模型優(yōu)化。若是在尋找適用于通用機(jī)器學(xué)習(xí)的訓(xùn)練循環(huán)實(shí)現(xiàn),請另覓他庫。
  • 盡管我們已盡力而為,examples 目錄中的腳本也僅為用例而已。對于你的特定問題,它們并不一定開箱即用,可能需要改幾行代碼以適之。

安裝

使用 pip

這個(gè)倉庫已在 Python 3.6+、Flax 0.3.2+、PyTorch 1.3.1+ 和 TensorFlow 2.3+ 下經(jīng)過測試。

你可以在虛擬環(huán)境中安裝  Transformers。如果你還不熟悉 Python 的虛擬環(huán)境,請閱此用戶說明。

首先,用你打算使用的版本的 Python 創(chuàng)建一個(gè)虛擬環(huán)境并激活。

然后,你需要安裝 Flax、PyTorch 或 TensorFlow 其中之一。關(guān)于在你使用的平臺上安裝這些框架,請參閱 TensorFlow 安裝頁PyTorch 安裝頁 或 Flax 安裝頁。

當(dāng)這些后端之一安裝成功后,  Transformers 可依此安裝:

pip install transformers

如果你想要試試用例或者想在正式發(fā)布前使用最新的開發(fā)中代碼,你得從源代碼安裝。

使用 conda

自 Transformers 4.0.0 版始,我們有了一個(gè) conda 頻道: huggingface。

 Transformers 可以通過 conda 依此安裝:

conda install -c huggingface transformers

要通過 conda 安裝 Flax、PyTorch 或 TensorFlow 其中之一,請參閱它們各自安裝頁的說明。

模型架構(gòu)

 Transformers 支持的所有的模型檢查點(diǎn)用戶組織上傳,均與 huggingface.co model hub 無縫整合。

目前的檢查點(diǎn)數(shù)量: 

 Transformers 目前支持如下的架構(gòu)(模型概述請閱這里):

  1. ALBERT (來自 Google Research and the Toyota Technological Institute at Chicago) 伴隨論文 ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, 由 Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut 發(fā)布。
  2. ALIGN (來自 Google Research) 伴隨論文 Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision 由 Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig 發(fā)布。
  3. AltCLIP (來自 BAAI) 伴隨論文 AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities 由 Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell 發(fā)布。
  4. Audio Spectrogram Transformer (來自 MIT) 伴隨論文 AST: Audio Spectrogram Transformer 由 Yuan Gong, Yu-An Chung, James Glass 發(fā)布。
  5. BART (來自 Facebook) 伴隨論文 BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension 由 Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer 發(fā)布。
  6. BARThez (來自 école polytechnique) 伴隨論文 BARThez: a Skilled Pretrained French Sequence-to-Sequence Model 由 Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis 發(fā)布。
  7. BARTpho (來自 VinAI Research) 伴隨論文 BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese 由 Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen 發(fā)布。
  8. BEiT (來自 Microsoft) 伴隨論文 BEiT: BERT Pre-Training of Image Transformers 由 Hangbo Bao, Li Dong, Furu Wei 發(fā)布。
  9. BERT (來自 Google) 伴隨論文 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 由 Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova 發(fā)布。
  10. BERT For Sequence Generation (來自 Google) 伴隨論文 Leveraging Pre-trained Checkpoints for Sequence Generation Tasks 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 發(fā)布。
  11. BERTweet (來自 VinAI Research) 伴隨論文 BERTweet: A pre-trained language model for English Tweets 由 Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen 發(fā)布。
  12. BigBird-Pegasus (來自 Google Research) 伴隨論文 Big Bird: Transformers for Longer Sequences 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 發(fā)布。
  13. BigBird-RoBERTa (來自 Google Research) 伴隨論文 Big Bird: Transformers for Longer Sequences 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 發(fā)布。
  14. BioGpt (來自 Microsoft Research AI4Science) 伴隨論文 BioGPT: generative pre-trained transformer for biomedical text generation and mining 由 Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu 發(fā)布。
  15. BiT (來自 Google AI) 伴隨論文 [Big Transfer (BiT) 由 Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby 發(fā)布。
  16. Blenderbot (來自 Facebook) 伴隨論文 Recipes for building an open-domain chatbot 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 發(fā)布。
  17. BlenderbotSmall (來自 Facebook) 伴隨論文 Recipes for building an open-domain chatbot 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 發(fā)布。
  18. BLIP (來自 Salesforce) 伴隨論文 BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation 由 Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi 發(fā)布。
  19. BLIP-2 (來自 Salesforce) 伴隨論文 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models 由 Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi 發(fā)布。
  20. BLOOM (from BigScience workshop) released by the BigScience Workshop.
  21. BORT (來自 Alexa) 伴隨論文 Optimal Subarchitecture Extraction For BERT 由 Adrian de Wynter and Daniel J. Perry 發(fā)布。
  22. BridgeTower (from Harbin Institute of Technology/Microsoft Research Asia/Intel Labs) released with the paper BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
  23. ByT5 (來自 Google Research) 伴隨論文 ByT5: Towards a token-free future with pre-trained byte-to-byte models 由 Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel 發(fā)布。
  24. CamemBERT (來自 Inria/Facebook/Sorbonne) 伴隨論文 CamemBERT: a Tasty French Language Model 由 Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, éric Villemonte de la Clergerie, Djamé Seddah and Beno?t Sagot 發(fā)布。
  25. CANINE (來自 Google Research) 伴隨論文 CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation 由 Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting 發(fā)布。
  26. Chinese-CLIP (來自 OFA-Sys) 伴隨論文 Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese 由 An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou 發(fā)布。
  27. CLAP (來自 LAION-AI) 伴隨論文 [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation]https://arxiv.org/abs/2211.06687) 由 Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov 發(fā)布。
  28. CLIP (來自 OpenAI) 伴隨論文 Learning Transferable Visual Models From Natural Language Supervision 由 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever 發(fā)布。
  29. CLIPSeg (來自 University of G?ttingen) 伴隨論文 Image Segmentation Using Text and Image Prompts 由 Timo Lüddecke and Alexander Ecker 發(fā)布。
  30. CodeGen (來自 Salesforce) 伴隨論文 A Conversational Paradigm for Program Synthesis 由 Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong 發(fā)布。
  31. Conditional DETR (來自 Microsoft Research Asia) 伴隨論文 Conditional DETR for Fast Training Convergence 由 Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang 發(fā)布。
  32. ConvBERT (來自 YituTech) 伴隨論文 ConvBERT: Improving BERT with Span-based Dynamic Convolution 由 Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan 發(fā)布。
  33. ConvNeXT (來自 Facebook AI) 伴隨論文 A ConvNet for the 2020s 由 Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie 發(fā)布。
  34. CPM (來自 Tsinghua University) 伴隨論文 CPM: A Large-scale Generative Chinese Pre-trained Language Model 由 Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun 發(fā)布。
  35. CTRL (來自 Salesforce) 伴隨論文 CTRL: A Conditional Transformer Language Model for Controllable Generation 由 Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher 發(fā)布。
  36. CvT (來自 Microsoft) 伴隨論文 CvT: Introducing Convolutions to Vision Transformers 由 Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang 發(fā)布。
  37. Data2Vec (來自 Facebook) 伴隨論文 Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language 由 Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli 發(fā)布。
  38. DeBERTa (來自 Microsoft) 伴隨論文 DeBERTa: Decoding-enhanced BERT with Disentangled Attention 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 發(fā)布。
  39. DeBERTa-v2 (來自 Microsoft) 伴隨論文 DeBERTa: Decoding-enhanced BERT with Disentangled Attention 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 發(fā)布。
  40. Decision Transformer (來自 Berkeley/Facebook/Google) 伴隨論文 Decision Transformer: Reinforcement Learning via Sequence Modeling 由 Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch 發(fā)布。
  41. Deformable DETR (來自 SenseTime Research) 伴隨論文 Deformable DETR: Deformable Transformers for End-to-End Object Detection 由 Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai 發(fā)布。
  42. DeiT (來自 Facebook) 伴隨論文 Training data-efficient image transformers & distillation through attention 由 Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou 發(fā)布。
  43. DETA (來自 The University of Texas at Austin) 伴隨論文 NMS Strikes Back 由 Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Kr?henbühl 發(fā)布。
  44. DETR (來自 Facebook) 伴隨論文 End-to-End Object Detection with Transformers 由 Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko 發(fā)布。
  45. DialoGPT (來自 Microsoft Research) 伴隨論文 DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation 由 Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan 發(fā)布。
  46. DiNAT (來自 SHI Labs) 伴隨論文 Dilated Neighborhood Attention Transformer 由 Ali Hassani and Humphrey Shi 發(fā)布。
  47. DistilBERT (來自 HuggingFace), 伴隨論文 DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter 由 Victor Sanh, Lysandre Debut and Thomas Wolf 發(fā)布。 同樣的方法也應(yīng)用于壓縮 GPT-2 到 DistilGPT2, RoBERTa 到 DistilRoBERTa, Multilingual BERT 到 DistilmBERT 和德語版 DistilBERT。
  48. DiT (來自 Microsoft Research) 伴隨論文 DiT: Self-supervised Pre-training for Document Image Transformer 由 Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei 發(fā)布。
  49. Donut (來自 NAVER) 伴隨論文 OCR-free Document Understanding Transformer 由 Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park 發(fā)布。
  50. DPR (來自 Facebook) 伴隨論文 Dense Passage Retrieval for Open-Domain Question Answering 由 Vladimir Karpukhin, Barlas O?uz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih 發(fā)布。
  51. DPT (來自 Intel Labs) 伴隨論文 Vision Transformers for Dense Prediction 由 René Ranftl, Alexey Bochkovskiy, Vladlen Koltun 發(fā)布。
  52. EfficientFormer (來自 Snap Research) 伴隨論文 EfficientFormer: Vision Transformers at MobileNetSpeed 由 Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren 發(fā)布。
  53. EfficientNet (from Google Brain) released with the paper EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks by Mingxing Tan, Quoc V. Le.
  54. ELECTRA (來自 Google Research/Stanford University) 伴隨論文 ELECTRA: Pre-training text encoders as discriminators rather than generators 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 發(fā)布。
  55. EncoderDecoder (來自 Google Research) 伴隨論文 Leveraging Pre-trained Checkpoints for Sequence Generation Tasks 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 發(fā)布。
  56. ERNIE (來自 Baidu) 伴隨論文 ERNIE: Enhanced Representation through Knowledge Integration by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 發(fā)布。
  57. ErnieM (來自 Baidu) 伴隨論文 ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora 由 Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang 發(fā)布。
  58. ESM (from Meta AI) are transformer protein language models. ESM-1b was released with the paper Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. ESM-1v was released with the paper Language models enable zero-shot prediction of the effects of mutations on protein function by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. ESM-2 was released with the paper Language models of protein sequences at the scale of evolution enable accurate structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
  59. FLAN-T5 (from Google AI) released in the repository google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
  60. FLAN-UL2 (from Google AI) released in the repository google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
  61. FlauBERT (來自 CNRS) 伴隨論文 FlauBERT: Unsupervised Language Model Pre-training for French 由 Hang Le, Lo?c Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Beno?t Crabbé, Laurent Besacier, Didier Schwab 發(fā)布。
  62. FLAVA (來自 Facebook AI) 伴隨論文 FLAVA: A Foundational Language And Vision Alignment Model 由 Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela 發(fā)布。
  63. FNet (來自 Google Research) 伴隨論文 FNet: Mixing Tokens with Fourier Transforms 由 James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon 發(fā)布。
  64. Funnel Transformer (來自 CMU/Google Brain) 伴隨論文 Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing 由 Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le 發(fā)布。
  65. GIT (來自 Microsoft Research) 伴隨論文 GIT: A Generative Image-to-text Transformer for Vision and Language 由 Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang 發(fā)布。
  66. GLPN (來自 KAIST) 伴隨論文 Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth 由 Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim 發(fā)布。
  67. GPT (來自 OpenAI) 伴隨論文 Improving Language Understanding by Generative Pre-Training 由 Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever 發(fā)布。
  68. GPT Neo (來自 EleutherAI) 隨倉庫 EleutherAI/gpt-neo 發(fā)布。作者為 Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy 發(fā)布。
  69. GPT NeoX (from EleutherAI) released with the paper GPT-NeoX-20B: An Open-Source Autoregressive Language Model by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
  70. GPT NeoX Japanese (來自 ABEJA) 由 Shinya Otani, Takayoshi Makabe, Anuj Arora, Kyo Hattori。
  71. GPT-2 (來自 OpenAI) 伴隨論文 Language Models are Unsupervised Multitask Learners 由 Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever** 發(fā)布。
  72. GPT-J (來自 EleutherAI) 伴隨論文 kingoflolz/mesh-transformer-jax 由 Ben Wang and Aran Komatsuzaki 發(fā)布。
  73. GPT-Sw3 (from AI-Sweden) released with the paper Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey ?hman, Fredrik Carlsson, Magnus Sahlgren.
  74. GPTSAN-japanese released in the repository tanreinama/GPTSAN by 坂本俊之(tanreinama).
  75. Graphormer (from Microsoft) released with the paper Do Transformers Really Perform Bad for Graph Representation? by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu.
  76. GroupViT (來自 UCSD, NVIDIA) 伴隨論文 GroupViT: Semantic Segmentation Emerges from Text Supervision 由 Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang 發(fā)布。
  77. Hubert (來自 Facebook) 伴隨論文 HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units 由 Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed 發(fā)布。
  78. I-BERT (來自 Berkeley) 伴隨論文 I-BERT: Integer-only BERT Quantization 由 Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer 發(fā)布。
  79. ImageGPT (來自 OpenAI) 伴隨論文 Generative Pretraining from Pixels 由 Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever 發(fā)布。
  80. Informer (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
  81. Jukebox (from OpenAI) released with the paper Jukebox: A Generative Model for Music by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever.
  82. LayoutLM (來自 Microsoft Research Asia) 伴隨論文 LayoutLM: Pre-training of Text and Layout for Document Image Understanding 由 Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou 發(fā)布。
  83. LayoutLMv2 (來自 Microsoft Research Asia) 伴隨論文 LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding 由 Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou 發(fā)布。
  84. LayoutLMv3 (來自 Microsoft Research Asia) 伴隨論文 LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking 由 Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei 發(fā)布。
  85. LayoutXLM (來自 Microsoft Research Asia) 伴隨論文 LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding 由 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei 發(fā)布。
  86. LED (來自 AllenAI) 伴隨論文 Longformer: The Long-Document Transformer 由 Iz Beltagy, Matthew E. Peters, Arman Cohan 發(fā)布。
  87. LeViT (來自 Meta AI) 伴隨論文 LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference 由 Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze 發(fā)布。
  88. LiLT (來自 South China University of Technology) 伴隨論文 LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding 由 Jiapeng Wang, Lianwen Jin, Kai Ding 發(fā)布。
  89. Longformer (來自 AllenAI) 伴隨論文 Longformer: The Long-Document Transformer 由 Iz Beltagy, Matthew E. Peters, Arman Cohan 發(fā)布。
  90. LongT5 (來自 Google AI) released 伴隨論文 LongT5: Efficient Text-To-Text Transformer for Long Sequences 由 Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang 發(fā)布。
  91. LUKE (來自 Studio Ousia) 伴隨論文 LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention 由 Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto 發(fā)布。
  92. LXMERT (來自 UNC Chapel Hill) 伴隨論文 LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering 由 Hao Tan and Mohit Bansal 發(fā)布。
  93. M-CTC-T (來自 Facebook) 伴隨論文 Pseudo-Labeling For Massively Multilingual Speech Recognition 由 Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert 發(fā)布。
  94. M2M100 (來自 Facebook) 伴隨論文 Beyond English-Centric Multilingual Machine Translation 由 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin 發(fā)布。
  95. MarianMT 用 OPUS 數(shù)據(jù)訓(xùn)練的機(jī)器翻譯模型由 J?rg Tiedemann 發(fā)布。Marian Framework 由微軟翻譯團(tuán)隊(duì)開發(fā)。
  96. MarkupLM (來自 Microsoft Research Asia) 伴隨論文 MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding 由 Junlong Li, Yiheng Xu, Lei Cui, Furu Wei 發(fā)布。
  97. Mask2Former (來自 FAIR and UIUC) 伴隨論文 Masked-attention Mask Transformer for Universal Image Segmentation 由 Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar 發(fā)布。
  98. MaskFormer (from Meta and UIUC) released with the paper Per-Pixel Classification is Not All You Need for Semantic Segmentation by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov
  99. mBART (來自 Facebook) 伴隨論文 Multilingual Denoising Pre-training for Neural Machine Translation 由 Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer 發(fā)布。
  100. mBART-50 (來自 Facebook) 伴隨論文 Multilingual Translation with Extensible Multilingual Pretraining and Finetuning 由 Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan 發(fā)布。
  101. Megatron-BERT (來自 NVIDIA) 伴隨論文 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 發(fā)布。
  102. Megatron-GPT2 (來自 NVIDIA) 伴隨論文 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 發(fā)布。
  103. mLUKE (來自 Studio Ousia) 伴隨論文 mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models 由 Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka 發(fā)布。
  104. MobileBERT (來自 CMU/Google Brain) 伴隨論文 MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices 由 Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou 發(fā)布。
  105. MobileNetV1 (來自 Google Inc.) 伴隨論文 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 由 Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam 發(fā)布。
  106. MobileNetV2 (來自 Google Inc.) 伴隨論文 MobileNetV2: Inverted Residuals and Linear Bottlenecks 由 Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen 發(fā)布。
  107. MobileViT (來自 Apple) 伴隨論文 MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer 由 Sachin Mehta and Mohammad Rastegari 發(fā)布。
  108. MPNet (來自 Microsoft Research) 伴隨論文 MPNet: Masked and Permuted Pre-training for Language Understanding 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 發(fā)布。
  109. MT5 (來自 Google AI) 伴隨論文 mT5: A massively multilingual pre-trained text-to-text transformer 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 發(fā)布。
  110. MVP (來自 中國人民大學(xué) AI Box) 伴隨論文 MVP: Multi-task Supervised Pre-training for Natural Language Generation 由 Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen 發(fā)布。
  111. NAT (來自 SHI Labs) 伴隨論文 Neighborhood Attention Transformer 由 Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi 發(fā)布。
  112. Nezha (來自華為諾亞方舟實(shí)驗(yàn)室) 伴隨論文 NEZHA: Neural Contextualized Representation for Chinese Language Understanding 由 Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu 發(fā)布。
  113. NLLB (來自 Meta) 伴隨論文 No Language Left Behind: Scaling Human-Centered Machine Translation 由 the NLLB team 發(fā)布。
  114. Nystr?mformer (來自 the University of Wisconsin - Madison) 伴隨論文 Nystr?mformer: A Nystr?m-Based Algorithm for Approximating Self-Attention 由 Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh 發(fā)布。
  115. OneFormer (來自 SHI Labs) 伴隨論文 OneFormer: One Transformer to Rule Universal Image Segmentation 由 Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi 發(fā)布。
  116. OPT (來自 Meta AI) 伴隨論文 OPT: Open Pre-trained Transformer Language Models 由 Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al 發(fā)布。
  117. OWL-ViT (來自 Google AI) 伴隨論文 Simple Open-Vocabulary Object Detection with Vision Transformers 由 Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby 發(fā)布。
  118. Pegasus (來自 Google) 伴隨論文 PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 發(fā)布。
  119. PEGASUS-X (來自 Google) 伴隨論文 Investigating Efficiently Extending Transformers for Long Input Summarization 由 Jason Phang, Yao Zhao, Peter J. Liu 發(fā)布。
  120. Perceiver IO (來自 Deepmind) 伴隨論文 Perceiver IO: A General Architecture for Structured Inputs & Outputs 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Jo?o Carreira 發(fā)布。
  121. PhoBERT (來自 VinAI Research) 伴隨論文 PhoBERT: Pre-trained language models for Vietnamese 由 Dat Quoc Nguyen and Anh Tuan Nguyen 發(fā)布。
  122. PLBart (來自 UCLA NLP) 伴隨論文 Unified Pre-training for Program Understanding and Generation 由 Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang 發(fā)布。
  123. PoolFormer (來自 Sea AI Labs) 伴隨論文 MetaFormer is Actually What You Need for Vision 由 Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng 發(fā)布。
  124. ProphetNet (來自 Microsoft Research) 伴隨論文 ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 發(fā)布。
  125. QDQBert (來自 NVIDIA) 伴隨論文 Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation 由 Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius 發(fā)布。
  126. RAG (來自 Facebook) 伴隨論文 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks 由 Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rockt?schel, Sebastian Riedel, Douwe Kiela 發(fā)布。
  127. REALM (來自 Google Research) 伴隨論文 REALM: Retrieval-Augmented Language Model Pre-Training 由 Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang 發(fā)布。
  128. Reformer (來自 Google Research) 伴隨論文 Reformer: The Efficient Transformer 由 Nikita Kitaev, ?ukasz Kaiser, Anselm Levskaya 發(fā)布。
  129. RegNet (from META Research) released with the paper Designing Network Design Space by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
  130. RemBERT (來自 Google Research) 伴隨論文 Rethinking embedding coupling in pre-trained language models 由 Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder 發(fā)布。
  131. ResNet (from Microsoft Research) released with the paper Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
  132. RoBERTa (來自 Facebook), 伴隨論文 Robustly Optimized BERT Pretraining Approach 由 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov 發(fā)布。
  133. RoBERTa-PreLayerNorm (來自 Facebook) 伴隨論文 fairseq: A Fast, Extensible Toolkit for Sequence Modeling 由 Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli 發(fā)布。
  134. RoCBert (來自 WeChatAI), 伴隨論文 RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining 由 HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou 發(fā)布。
  135. RoFormer (來自 ZhuiyiTechnology), 伴隨論文 RoFormer: Enhanced Transformer with Rotary Position Embedding 由 Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu 發(fā)布。
  136. SegFormer (來自 NVIDIA) 伴隨論文 SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers 由 Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo 發(fā)布。
  137. SEW (來自 ASAPP) 伴隨論文 Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition 由 Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi 發(fā)布。
  138. SEW-D (來自 ASAPP) 伴隨論文 Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition 由 Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi 發(fā)布。
  139. SpeechT5 (來自 Microsoft Research) 伴隨論文 SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing 由 Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei 發(fā)布。
  140. SpeechToTextTransformer (來自 Facebook), 伴隨論文 fairseq S2T: Fast Speech-to-Text Modeling with fairseq 由 Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino 發(fā)布。
  141. SpeechToTextTransformer2 (來自 Facebook) 伴隨論文 Large-Scale Self- and Semi-Supervised Learning for Speech Translation 由 Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau 發(fā)布。
  142. Splinter (來自 Tel Aviv University) 伴隨論文 Few-Shot Question Answering by Pretraining Span Selection 由 Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy 發(fā)布。
  143. SqueezeBERT (來自 Berkeley) 伴隨論文 SqueezeBERT: What can computer vision teach NLP about efficient neural networks? 由 Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer 發(fā)布。
  144. Swin Transformer (來自 Microsoft) 伴隨論文 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 由 Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo 發(fā)布。
  145. Swin Transformer V2 (來自 Microsoft) 伴隨論文 Swin Transformer V2: Scaling Up Capacity and Resolution 由 Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo 發(fā)布。
  146. Swin2SR (來自 University of Würzburg) 伴隨論文 Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration 由 Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte 發(fā)布。
  147. SwitchTransformers (from Google) released with the paper Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity by William Fedus, Barret Zoph, Noam Shazeer.
  148. T5 (來自 Google AI) 伴隨論文 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 發(fā)布。
  149. T5v1.1 (來自 Google AI) 伴隨論文 google-research/text-to-text-transfer-transformer 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 發(fā)布。
  150. Table Transformer (來自 Microsoft Research) 伴隨論文 PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents 由 Brandon Smock, Rohith Pesala, Robin Abraham 發(fā)布。
  151. TAPAS (來自 Google AI) 伴隨論文 TAPAS: Weakly Supervised Table Parsing via Pre-training 由 Jonathan Herzig, Pawe? Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos 發(fā)布。
  152. TAPEX (來自 Microsoft Research) 伴隨論文 TAPEX: Table Pre-training via Learning a Neural SQL Executor 由 Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou 發(fā)布。
  153. Time Series Transformer (from HuggingFace).
  154. TimeSformer (from Facebook) released with the paper Is Space-Time Attention All You Need for Video Understanding? by Gedas Bertasius, Heng Wang, Lorenzo Torresani.
  155. Trajectory Transformer (from the University of California at Berkeley) released with the paper Offline Reinforcement Learning as One Big Sequence Modeling Problem by Michael Janner, Qiyang Li, Sergey Levine
  156. Transformer-XL (來自 Google/CMU) 伴隨論文 Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context 由 Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov 發(fā)布。
  157. TrOCR (來自 Microsoft) 伴隨論文 TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models 由 Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei 發(fā)布。
  158. TVLT (來自 UNC Chapel Hill) 伴隨論文 TVLT: Textless Vision-Language Transformer 由 Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal 發(fā)布。
  159. UL2 (from Google Research) released with the paper Unifying Language Learning Paradigms by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
  160. UniSpeech (來自 Microsoft Research) 伴隨論文 UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data 由 Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang 發(fā)布。
  161. UniSpeechSat (來自 Microsoft Research) 伴隨論文 UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING 由 Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu 發(fā)布。
  162. UPerNet (來自 Peking University) 伴隨論文 Unified Perceptual Parsing for Scene Understanding 由 Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun 發(fā)布。
  163. VAN (來自 Tsinghua University and Nankai University) 伴隨論文 Visual Attention Network 由 Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu 發(fā)布。
  164. VideoMAE (來自 Multimedia Computing Group, Nanjing University) 伴隨論文 VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training 由 Zhan Tong, Yibing Song, Jue Wang, Limin Wang 發(fā)布。
  165. ViLT (來自 NAVER AI Lab/Kakao Enterprise/Kakao Brain) 伴隨論文 ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision 由 Wonjae Kim, Bokyung Son, Ildoo Kim 發(fā)布。
  166. Vision Transformer (ViT) (來自 Google AI) 伴隨論文 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale 由 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby 發(fā)布。
  167. VisualBERT (來自 UCLA NLP) 伴隨論文 VisualBERT: A Simple and Performant Baseline for Vision and Language 由 Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang 發(fā)布。
  168. ViT Hybrid (來自 Google AI) 伴隨論文 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale 由 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby 發(fā)布。
  169. ViTMAE (來自 Meta AI) 伴隨論文 Masked Autoencoders Are Scalable Vision Learners 由 Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick 發(fā)布。
  170. ViTMSN (來自 Meta AI) 伴隨論文 Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas 發(fā)布.
  171. Wav2Vec2 (來自 Facebook AI) 伴隨論文 wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations 由 Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli 發(fā)布。
  172. Wav2Vec2-Conformer (來自 Facebook AI) 伴隨論文 FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ 由 Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino 發(fā)布。
  173. Wav2Vec2Phoneme (來自 Facebook AI) 伴隨論文 Simple and Effective Zero-shot Cross-lingual Phoneme Recognition 由 Qiantong Xu, Alexei Baevski, Michael Auli 發(fā)布。
  174. WavLM (from Microsoft Research) released with the paper WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
  175. Whisper (來自 OpenAI) 伴隨論文 Robust Speech Recognition via Large-Scale Weak Supervision 由 Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever 發(fā)布。
  176. X-CLIP (來自 Microsoft Research) 伴隨論文 Expanding Language-Image Pretrained Models for General Video Recognition 由 Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling 發(fā)布。
  177. X-MOD (來自 Meta AI) 伴隨論文 Lifting the Curse of Multilinguality by Pre-training Modular Transformers 由 Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe 發(fā)布。
  178. XGLM (From Facebook AI) released with the paper Few-shot Learning with Multilingual Language Models by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
  179. XLM (來自 Facebook) 伴隨論文 Cross-lingual Language Model Pretraining 由 Guillaume Lample and Alexis Conneau 發(fā)布。
  180. XLM-ProphetNet (來自 Microsoft Research) 伴隨論文 ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 發(fā)布。
  181. XLM-RoBERTa (來自 Facebook AI), 伴隨論文 Unsupervised Cross-lingual Representation Learning at Scale 由 Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov 發(fā)布。
  182. XLM-RoBERTa-XL (來自 Facebook AI) 伴隨論文 Larger-Scale Transformers for Multilingual Masked Language Modeling 由 Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau 發(fā)布。
  183. XLM-V (來自 Meta AI) 伴隨論文 XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models 由 Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa 發(fā)布。
  184. XLNet (來自 Google/CMU) 伴隨論文 XLNet: Generalized Autoregressive Pretraining for Language Understanding 由 Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le 發(fā)布。
  185. XLS-R (來自 Facebook AI) 伴隨論文 XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale 由 Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli 發(fā)布。
  186. XLSR-Wav2Vec2 (來自 Facebook AI) 伴隨論文 Unsupervised Cross-Lingual Representation Learning For Speech Recognition 由 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli 發(fā)布。
  187. YOLOS (來自 Huazhong University of Science & Technology) 伴隨論文 You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection 由 Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu 發(fā)布。
  188. YOSO (來自 the University of Wisconsin - Madison) 伴隨論文 [You Only Sample (Almost) 由 Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh 發(fā)布。
  189. 想要貢獻(xiàn)新的模型?我們這里有一份詳細(xì)指引和模板來引導(dǎo)你添加新的模型。你可以在 templates 目錄中找到他們。記得查看 貢獻(xiàn)指南 并在開始寫 PR 前聯(lián)系維護(hù)人員或開一個(gè)新的 issue 來獲得反饋。

要檢查某個(gè)模型是否已有 Flax、PyTorch 或 TensorFlow 的實(shí)現(xiàn),或其是否在  Tokenizers 庫中有對應(yīng)詞符化器(tokenizer),敬請參閱此表

這些實(shí)現(xiàn)均已于多個(gè)數(shù)據(jù)集測試(請參看用例腳本)并應(yīng)于原版實(shí)現(xiàn)表現(xiàn)相當(dāng)。你可以在用例文檔的此節(jié)中了解表現(xiàn)的細(xì)節(jié)。

了解更多

章節(jié)描述
文檔完整的 API 文檔和教程
任務(wù)總結(jié) Transformers 支持的任務(wù)
預(yù)處理教程使用 Tokenizer 來為模型準(zhǔn)備數(shù)據(jù)
訓(xùn)練和微調(diào)在 PyTorch/TensorFlow 的訓(xùn)練循環(huán)或 Trainer API 中使用  Transformers 提供的模型
快速上手:微調(diào)和用例腳本為各種任務(wù)提供的用例腳本
模型分享和上傳和社區(qū)上傳和分享你微調(diào)的模型
遷移從 pytorch-transformers 或 pytorch-pretrained-bert 遷移到  Transformers

引用

我們已將此庫的論文正式發(fā)表,如果你使用了  Transformers 庫,請引用:

@inproceedings{wolf-etal-2020-transformers,
    title = "Transformers: State-of-the-Art Natural Language Processing",
    author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = oct,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
    pages = "38--45"
}


以上內(nèi)容是否對您有幫助:
在線筆記
App下載
App下載

掃描二維碼

下載編程獅App

公眾號
微信公眾號

編程獅公眾號