

To be able to do so, you can use pip: pip install from _net import VQANet from _net import VRDNet #. from 2 import VQA2 from import TDIUC from import VG from import VRD #. Block (, 300 ) inputs = out = mm ( inputs ) # torch.Size() #. As a python library)īy importing the block python module, you can access every fusions, datasets and models in a simple way: import torch from block import fusions mm = fusions. We don't provide the code for pretraining or extracting features for now. Note: The features have been extracted from a pretrained Faster-RCNN with caffe. As standalone project conda create -name block python=3ĭownload annotations, images and features for VRD experiments: bash block/datasets/scripts/download_vrd.shĭownload annotations, images and features for VQA experiments: bash block/datasets/scripts/download_vqa2.shīash block/datasets/scripts/download_vgenome.shīash block/datasets/scripts/download_tdiuc.sh We advise you to install python 3 with Anaconda.

Also, we provide pretrained models and all the code needed to reproduce our experiments. In this repo, we make our BLOCK fusion available via pip install including several powerful fusions from the state-of-the-art (MLB, MUTAN, MCB, MFB, MFH, etc.). For further details, please see our AAAI 2019 paper and poster. Secondly, we provide a theoritical-grounded analysis around the notion of tensor complexity. First, we experimentaly demonstrate that it is better than any available fusion for our tasks. We introduce a novel module (BLOCK) to fuse two representations together. This multimodal embedding is latter classified to provide the answer. In Machine Learning, an important question is "How to fuse two modalities in a same space".įor instance, in Visual Question Answering, one must fuse the image and the question embeddings in a same bi-modal space. BLOCK: Bilinear Superdiagonal Fusion for VQA and VRD
