Publications / I-Sheng Fang

2026 International Journal of Computer Vision (IJCV)

US\(^{3}\)Net: Ultralightweight Self-Supervised Stereo Matching Network using Depth-Aware Geometric Soft Occlusion

Po-Chung Jen, Tzu-Chi Liu, I-Sheng Fang, Hsiao-Chieh Wen, Chia-Lun Hsu, Ping-Yang Chen, Chang-Hsing Lee, Yong-Sheng Chen

An ultralightweight self-supervised stereo matching network for efficient stereo depth estimation on resource-constrained devices, combining a low-complexity feature extractor with Depth-Aware Geometric Soft Occlusion (DAGSO) to improve occlusion handling while using only 12K parameters.

Paper

Adapt2Hide teaser showing reversible visual processing with an off-the-shelf autoencoder.

2026 IEEE International Conference on Image Processing (ICIP)

Adapt2Hide: Leveraging Off-the-Shelf Autoencoder for Reversible Visual Processing

Ernie Chu^*, I-Sheng Fang^*, Tai-Ming Huang, Pin-Yen Chiu, Vishal Patel, Jun-Cheng Chen

^* Equal contribution.

An image steganography approach using large pre-trained autoencoders and LoRA to support high-quality message reconstruction with minimal additional parameters for reversible visual processing applications.

Gen-n-Val teaser showing agentic image data generation and validation workflow.

2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition Findings Track (CVPRF)

Gen-n-Val: Agentic Image Data Generation and Validation

Jing-En Huang^*, I-Sheng Fang^*, Tzuhsuan Huang, Chih-Yu Wang, Jun-Cheng Chen

^* Equal contribution.

An agentic framework for image data generation and validation using Layer Diffusion, LLM prompt agents, and VLLM validation agents to improve object detection and segmentation training data.

Paper

2026 International Conference on Learning Representations (ICLR)

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

Samin Mahdizadeh Sani^*, Max Ku^*, Nima Jamali, Matina Mahdizadeh Sani, Paria Khoshtab, Wei-Chieh Sun, Parnian Fazel, Zhi Rui Tam, Thomas Chong, Edisy Kin Wai Chan, Donald Wai Tong Tsang, Chiao-Wei Hsu, Ting Wai Lam, Ho Yin Sam Ng, Chiafeng Chu, Chak-Wing Mak, Keming Wu, Hiu Tung Wong, Yik Chun Ho, Chi Ruan, Zhuofeng Li, I-Sheng Fang, Shih-Ying Yeh, Ho Kei Cheng, Ping Nie, Wenhu Chen

^* Equal contribution.

A large-scale benchmark for evaluating modern image-generation models across open-ended real-world tasks with fine-grained human annotations.

Text Slider teaser showing continuous visual concept control.

2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters

Pin-Yen Chiu, I-Sheng Fang, Jun-Cheng Chen

A lightweight and efficient method for continuous concept control in diffusion models by fine-tuning low-rank directions in the text encoder.

2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

KMOPS: Keypoint-Driven Method for Multi-Object Pose and Metric Size Estimation from Stereo Images

Ying-Kun Wu, Yi Shen, Tzuhsuan Huang, I-Sheng Fang, Jun-Cheng Chen

A keypoint-driven method for estimating 6-DoF pose and metric size of multiple objects from a calibrated stereo image pair.

Project
Code

2025 NeurIPS 2025 Workshop on SPACE in Vision, Language, and Embodied AI (SpaVLE)

Every Camera Effect, Every Time, All at Once: 4D Gaussian Ray Tracing for Physics-based Camera Effect Data Generation

Yi-Ruei Liu^*, You-Zhe Xie^*, Yu-Hsiang Hsu^*, I-Sheng Fang^†, Yu-Lun Liu, Jun-Cheng Chen

^* Equal contribution. Work done at Academia Sinica as interns. ^† Internship mentor.

A two-stage pipeline combining 4D Gaussian Splatting with physically based ray tracing to simulate real-world camera effects such as fisheye distortion, rolling shutter, and depth of field.

2025 Multimodal Algorithmic Reasoning Workshop (MAR), IEEE/CVF CVPR Workshops

CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography

I-Sheng Fang, Jun-Cheng Chen

A benchmark for photography-related visual reasoning tasks that test how multimodal large language models understand the effects of camera settings on image appearance.

Paper
MAR

2024 SIGGRAPH Asia

Camera Settings as Tokens: Modeling Photography on Latent Diffusion Models

I-Sheng Fang, Yue-Hua Han, Jun-Cheng Chen

A latent diffusion approach that represents camera settings as controllable tokens for photographic image generation.

iToF and RGB depth integration teaser animation.

2024 International Conference on Pattern Recognition (ICPR)

Best of Both Sides: Integration of Absolute and Relative Depth Sensing Modalities Based on iToF and RGB Cameras

I-Sheng Fang, Wei-Chen Chiu, Yong-Sheng Chen

A depth-sensing integration method that combines active iToF sensing with passive RGB cues to estimate high-resolution metric depth without metric depth supervision.

Paper
Code

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

ES3Net: Accurate and Efficient Edge-Based Self-Supervised Stereo Matching Network

I-Sheng Fang, Hsiao-Chieh Wen, Chia-Lun Hsu, Po-Chung Jen, Ping-Yang Chen, Yong-Sheng Chen

An efficient edge-based self-supervised stereo matching network for robust depth estimation on drones and embedded devices.

CVPRW
Code

Self-contained stylization result animation.

2020 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Self-Contained Stylization via Steganography for Reverse and Serial Style Transfer

Hung-Yu Chen^*, I-Sheng Fang^*, Chia-Ming Cheng, Wei-Chen Chiu

^* Equal contribution.

A two-stage model that integrates neural style transfer and deep steganography to support reverse and serial style transfer.

2022 IEEE Signal Processing Letters

Single Image Reflection Removal Based on Knowledge-Distilling Content Disentanglement

Yan-Tsung Peng, Kai-Han Cheng, I-Sheng Fang, Wen-Yi Peng, Jr-Shian Wu

A single-image reflection removal method that disentangles reflection and transmission features with knowledge distillation.