Computer Vision : CVPR

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Submit: Shi, Baoguang, Xiang Bai, and Cong Yao. CVPR (2017)

Blog: https://medium.com/@mldevhong/논문-번역-rcnn-an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-f6456886d6f8

CNN과 RNN이 결합된 CRNN(Convolutional Recurrent Neural Network)을 제안한 논문
CNN으로 이미지 피처를 추출하고, 이 피처를 양방향 LSTM에 집어넣은 뒤 label sequence로 변환해서 글자 인식하는 모델 제안
이미지에서 문자열을 추출하거나 이미지로 된 악보를 인식하는데 유용하게 사용할 수 있음

Latent Embeddings for Zero-shot Classification

Submit: Xian, Yongqin. CVPR (2016)

Paper: https://arxiv.org/abs/1603.08895?context=cs

text corpora를 이용해 만든 w2v과 image space를 한 공간에 두기로 함
비슷하면 끌어당기고 다르면 밀어내는 성질을 이용 (ex. 구글 실험)
모델 수 K 정하기 : W 늘린 후 5회 알고리즘 돌리고 결과 5%미만인 W 제거

Learning Deep Features for Discriminative Localization

Submit: Zhou, Bolei. CVPR (2016)

Paper: https://arxiv.org/abs/1512.04150

기존 CNN 모델이 이미지 feature의 위치 정보를 무시하고 동작하던 문제를 해결한 논문.
기존 CNN 모델의 feature layer에 channel 수가 class 개수인 CNN layer를 추가해서, 각각의 channel을 _Class Activation Map(CAM)_으로 사용하는 방법 제안. 각각의 CAM을 GAP(Global Average Pooling)하여 classification에 사용.
VGGNet, GoogleNet 등에 CAM을 적용했을 때, 원래 VGGNet이나 GoogleNet에 비해 classification error는 조금 증가했다. 그러나 localization 단계에선, 이전 논문에 비해 localization error가 감소했다.

Learning Deep Structure-Preserving Image-Text Embeddings

Submit: Wang, Liwei, Yin Li, and Svetlana Lazebnik. CVPR (2016)

Paper: https://arxiv.org/abs/1511.06078

Github: https://github.com/lwwang/Two_branch_network

Linear Projection과 nonlinear activation function으로 구성된 two-branch neural network를 이용하여, image와 text의 join embedding을 학습하는 방법 제안
같은 category에 속한 이미지끼리/텍스트끼리 거리를 좁히는 "structure preservation"을 위한 Loss Function 사용
Image-to-text, text-to-image retrieval의 accuracy 향상

Learning Two-Branch Neural Networks for Image-Text Matching Task

Submit: Wang, Liwei. CVPR (2017)

Paper: https://arxiv.org/abs/1704.03470

Two-branck neural network를 기반으로 한 임베딩을 통해, Image-Text 매칭을 수행하는 방법 제안
텍스트/이미지 임베딩 벡터를 element-size product하고, FC layer를 통과시키는 모델로, logistic loss 이용
기존 방법에 비해 Image-Text retrival 성능이 향상되었다.

Lightweight Network Architecture for Real-Time Action Recognition

Submit: Kozlov, Alexander, Vadim Andronov, Yana Gritsenko. CVPR (2019)

Paper: https://arxiv.org/abs/1905.08711

Git: https://github.com/opencv/openvino_training_extensions/tree/develop/pytorch_toolkit/action_recognition

Blog: https://deepmi.me/paper/architecture/19303/

RGB mono camera와 리얼타임 스피드를 보이는 CPU모델 VTN(Video Transformer Network) 제시
어떻게하면 여러 모델을 하나의 모델로 만들어 정확도를 향상시킬 수 있는지를 설명
행동 인식같은 비디오 레벨 테스크는 동작의 모호성을 해결하기 위해 여러 프레임의 정보를 종합하여 일시적인 구조를 고려해야 함

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Submit: Donahue, Jeffrey. CVPR (2015)

Paper: http://openaccess.thecvf.com/content_cvpr_2015/html/Donahue_Long-Term_Recurrent_Convolutional_2015_CVPR_paper.html

Blog: https://jay.tech.blog/2017/02/01/recurrent-convolutional-networks/

비디오와 같은 가변 인풋과 가변 아웃풋을 허용하고 복잡하고 연속적인 것을 모델링 가능 (LRCN)
conv layer와 긴 범위의 연속적인 recursion을 조합
직접적으로 visual convolutional model을 deep한 LSTM네트워크에 연결

Photo Wake-Up: 3D Character Animation from a Single Photo

Submit: Weng, Chung-Yi, Brian Curless, and Ira Kemelmacher-Shlizerman. CVPR (2019)

Paper: http://openaccess.thecvf.com/content_CVPR_2019/html/Weng_Photo_Wake-Up_3D_Character_Animation_From_a_Single_Photo_CVPR_2019_paper.html

single photo를 3D로 인간을 보여주는 애플리케이션
모델의 몸을 애니메이션화 하기 위해 분해할 수 있는 2D warping method 제안
Mesh construction, Self occulsion, Final Steps으로 구성

Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs

Submit: Zheng Shou, Dongang Wang, and Shih-Fu Chang, CVPR (2016)

Paper: https://arxiv.org/abs/1601.02129

Proposal network: action을 포함한 긴 비디오에서 후보 segments를 식별
Classification network: 1vsN action classification을 학습해 localization network 초기화
Localization network: 학습된 classification network를 파인튜닝하여 각 action을 localize

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

submit: Joao Carreira, Andrew Zisserman, CVPR (2017)

paper: http://openaccess.thecvf.com/content_cvpr_2017/html/Carreira_Quo_Vadis_Action_CVPR_2017_paper.html

Kinetics: 400개 class에 대해서 최소 400개, 10초 이상의 비디오 클립이 있는 30만개 dataset 제안
pre-trained 2D Conv filter를 weight를 그대로 복사하여 time축으로 확장한 I3D 모델
Kinetics와 같이 큰 데이터셋을 이용한 transfer learning의 효과가 있었음.

You Only Look Once: Unified, Real Time Object Detection

submit: Redmon, Joseph, CVPR (2016)

Paper: https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Redmon_You_Only_Look_CVPR_2016_paper.html

Git: https://github.com/qqwweee/keras-yolo3

Object detection 문제를 regression 문제로 변환하여 해결
한 번의 Feedforward로 feature를 추출할 수 있다 : 이미지 일부만 보는 것이 아니라 Global Context를 본다.
Real-time detection이 가능하다. (45 FPS)

Artificial Intelligence

Computer Vision

Natural Language Processing

Robotics

ICRA
IROS

Speech, Signal Processing

Computer Vision : CVPR

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Latent Embeddings for Zero-shot Classification

Learning Deep Features for Discriminative Localization

Learning Deep Structure-Preserving Image-Text Embeddings

Learning Two-Branch Neural Networks for Image-Text Matching Task

Lightweight Network Architecture for Real-Time Action Recognition

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Photo Wake-Up: 3D Character Animation from a Single Photo

Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

You Only Look Once: Unified, Real Time Object Detection

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally