Qin Ren

Hi there 👋. I am Qin Ren (also go by Lyra), a second-year PhD student in Computer Science at Stony Brook University, where I am fortunate to be advised by Prof. Chenyu You.

I received my B.S. from Huazhong University of Science and Technology and my M.S. from Tsinghua University.

Previously, I interned at OpenMMLab, contributing to self-supervised learning tools and training pipelines. I also interned at Tencent AI Lab, working on AI4Science research.

Email: qin.ren _at_ stonybrook.edu

Research Interests

My research is to scientifically understand how intelligence emerges. In particular, I focus on human-like intelligence that acquire abstraction, adaptivity, and generalization through interaction in rich environments, with the broader goal of understanding how complex cognitive capabilities emerge from data, experience, and structure. Questions we are currently studying include:

News Publications

[* = co-first authors]

Scale paper
Scale Where It Matters: Training-Free Localized Scaling for Diffusion Models
Qin Ren, Yufei Wang, Lanqing Guo, Wen Zhang, Zhiwen Fan, Chenyu You
Preprint 2025

Diffusion models have become the dominant paradigm in text-to-image generation, and test-time scaling (TTS) further improves quality by allocating more computation during inference. However, existing TTS methods operate at the full-image level, overlooking the fact that image quality is often spatially heterogeneous. This leads to unnecessary computation on already satisfactory regions and insufficient correction of localized defects. In this paper, we explore a new direction - Localized TTS - that adaptively resamples defective regions while preserving high-quality regions, thereby substantially reducing the search space. This paradigm poses two central challenges: accurately localizing defects and maintaining global consistency. We propose LoTTS, the first fully training-free framework for localized TTS. For defect localization, LoTTS contrasts cross- and self-attention signals under quality-aware prompts (e.g., high-quality vs. low-quality) to identify defective regions, and then refines them into coherent masks. For consistency, LoTTS perturbs only defective regions and denoises them locally, ensuring that corrections remain confined while the rest of the image remains undisturbed. Extensive experiments on SD2.1, SDXL, and FLUX demonstrate that LoTTS achieves state-of-the-art performance: it consistently improves both local quality and global fidelity, while reducing GPU cost by 2-4x compared to Best-of-N sampling. These findings establish localized TTS as a promising new direction for scaling diffusion models at inference time.

Sprout paper
Supervise Less, See More: Training-free Nuclear Instance Segmentation with Prototype-Guided Prompting
Wen Zhang, Qin Ren, Wenjing Liu, Haibin Ling, Chenyu You
Preprint 2025

Accurate nuclear instance segmentation is a pivotal task in computational pathology, supporting data-driven clinical insights and facilitating downstream translational applications. While large vision foundation models have shown promise for zero-shot biomedical segmentation, most existing approaches still depend on dense supervision and computationally expensive fine-tuning. Consequently, training-free methods present a compelling research direction, yet remain largely unexplored. In this work, we introduce SPROUT, a fully training- and annotation-free prompting framework for nuclear instance segmentation. SPROUT leverages histology-informed priors to construct slide-specific reference prototypes that mitigate domain gaps. These prototypes progressively guide feature alignment through a partial optimal transport scheme. The resulting foreground and background features are transformed into positive and negative point prompts, enabling the Segment Anything Model (SAM) to produce precise nuclear delineations without any parameter updates. Extensive experiments across multiple histopathology benchmarks demonstrate that SPROUT achieves competitive performance without supervision or retraining, establishing a novel paradigm for scalable, training-free nuclear instance segmentation in pathology.

TTA paper
Together, Then Apart: Revisiting Multimodal Survival Analysis via a Min-Max Perspective
Wenjing Liu, Qin Ren, Wen Zhang, Yuewei Lin, Chenyu You
Preprint 2025

Integrating heterogeneous modalities such as histopathology and genomics is central to advancing survival analysis, yet most existing methods prioritize cross-modal alignment through attention-based fusion mechanisms, often at the expense of modality-specific characteristics. This overemphasis on alignment leads to representation collapse and reduced diversity. In this work, we revisit multi-modal survival analysis via the dual lens of alignment and distinctiveness, positing that preserving modality-specific structure is as vital as achieving semantic coherence. In this paper, we introduce Together-Then-Apart (TTA), a unified min-max optimization framework that simultaneously models shared and modality-specific representations. The Together stage minimizes semantic discrepancies by aligning embeddings via shared prototypes, guided by an unbalanced optimal transport objective that adaptively highlights informative tokens. The Apart stage maximizes representational diversity through modality anchors and a contrastive regularizer that preserve unique modality information and prevent feature collapse. Extensive experiments on five TCGA benchmarks show that TTA consistently outperforms state-of-the-art methods. Beyond empirical gains, our formulation provides a new theoretical perspective of how alignment and distinctiveness can be jointly achieved for robust, interpretable, and biologically meaningful multi-modal survival analysis.

OTSurv paper
OTSurv: A Novel Multiple Instance Learning Framework for Survival Prediction with Heterogeneity-aware Optimal Transport
Qin Ren, Yifan Wang, Ruogu Fang, Haibin Ling, Chenyu You
MICCAI 2025

Survival prediction using whole slide images (WSIs) can be formulated as a multiple instance learning (MIL) problem. However, existing MIL methods often fail to explicitly capture pathological heterogeneity within WSIs, both globally -- through long-tailed morphological distributions, and locally through -- tile-level prediction uncertainty. Optimal transport (OT) provides a principled way of modeling such heterogeneity by incorporating marginal distribution constraints. Building on this insight, we propose OTSurv, a novel MIL framework from an optimal transport perspective. Specifically, OTSurv formulates survival predictions as a heterogeneity-aware OT problem with two constraints: (1) global long-tail constraint that models prior morphological distributions to avert both mode collapse and excessive uniformity by regulating transport mass allocation, and (2) local uncertainty-aware constraint that prioritizes high-confidence patches while suppressing noise by progressively raising the total transport mass. We then recast the initial OT problem, augmented by these constraints, into an unbalanced OT formulation that can be solved with an efficient, hardware-friendly matrix scaling algorithm. Empirically, OTSurv sets new state-of-the-art results across six popular benchmarks, achieving an absolute 3.6% improvement in average C-index. In addition, OTSurv achieves statistical significance in log-rank tests and offers high interpretability, making it a powerful tool for survival prediction in digital pathology. Our codes are available at https://github.com/Y-Research-SBU/OTSurv.

Ouroboros paper
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
Shanlin Sun*, Yifan Wang*, Hanwen Zhang*, Yifeng Xiong, Qin Ren, Ruogu Fang, Xiaohui Xie, Chenyu You
ICCV 2025

While multi-step diffusion models have advanced both forward and inverse rendering, existing approaches often treat these problems independently, leading to cycle inconsistency and slow inference speed. In this work, we present Ouroboros, a framework composed of two single-step diffusion models that handle forward and inverse rendering with mutual reinforcement. Our approach extends intrinsic decomposition to both indoor and outdoor scenes and introduces a cycle consistency mechanism that ensures coherence between forward and inverse rendering outputs. Experimental results demonstrate state-of-the-art performance across diverse scenes while achieving substantially faster inference speed compared to other diffusion-based methods. We also demonstrate that Ouroboros can transfer to video decomposition in a training-free manner, reducing temporal inconsistency in video sequences while maintaining high-quality per-frame inverse rendering.

DeepGEM paper
Deep learning using histological images for gene mutation prediction in lung cancer: a multicentre retrospective study
Yu Zhao*, Shan Xiong*, Qin Ren*, Jun Wang*, Min Li*, Lin Yang*, et al.
The Lancet Oncology 2024 (IF: 51.1)

Summary

Background Accurate detection of driver gene mutations is crucial for treatment planning and predicting prognosis for patients with lung cancer. Conventional genomic testing requires high-quality tissue samples and is time-consuming and resource-consuming, and as a result, is not available for most patients, especially those in low-resource settings. We aimed to develop an annotation-free Deep learning-enabled artificial intelligence method to predict GEne Mutations (DeepGEM) from routinely acquired histological slides.

Methods In this multicentre retrospective study, we collected data for patients with lung cancer who had a biopsy and multigene next-generation sequencing done at 16 hospitals in China (with no restrictions on age, sex, or histology type), to form a large multicentre dataset comprising paired pathological image and multiple gene mutation information. We also included patients from The Cancer Genome Atlas (TCGA) publicly available dataset. Our developed model is an instance-level and bag-level co-supervised multiple instance learning method with label disambiguation design. We trained and initially tested the DeepGEM model on the internal dataset (patients from the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China), and further evaluated it on the external dataset (patients from the remaining 15 centres) and the public TCGA dataset. Additionally, a dataset of patients from the same medical centre as the internal dataset, but without overlap, was used to evaluate the model's generalisation ability to biopsy samples from lymph node metastases. The primary objective was the performance of the DeepGEM model in predicting gene mutations (area under the curve [AUC] and accuracy) in the four prespecified groups (ie, the hold-out internal test set, multicentre external test set, TCGA set, and lymph node metastases set).

Findings Assessable pathological images and multigene testing information were available for 3697 patients who had biopsy and multigene next-generation sequencing done between Jan 1, 2018, and March 31, 2022, at the 16 centres. We excluded 60 patients with low-quality images. We included 3767 images from 3637 consecutive patients (1978 [54·4%] men, 1514 [41·6%] women, 145 [4·0%] unknown; median age 60 years [IQR 52–67]), with 1716 patients in the internal dataset, 1718 patients in the external dataset, and 203 patients in the lymph node metastases dataset. The DeepGEM model showed robust performance in the internal dataset: for excisional biopsy samples, AUC values for gene mutation prediction ranged from 0·90 (95% CI 0·77–1·00) to 0·97 (0·93–1·00) and accuracy values ranged from 0·91 (0·85–0·98) to 0·97 (0·93–1·00); for aspiration biopsy samples, AUC values ranged from 0·85 (0·80–0·91) to 0·95 (0·86–1·00) and accuracy values ranged from 0·79 (0·74–0·85) to 0·99 (0·98–1·00). In the multicentre external dataset, for excisional biopsy samples, AUC values ranged from 0·80 (95% CI 0·75–0·85) to 0·91 (0·88–1·00) and accuracy values ranged from 0·79 (0·76–0·82) to 0·95 (0·93–0·96); for aspiration biopsy samples, AUC values ranged from 0·76 (0·70–0·83) to 0·87 (0·80–0·94) and accuracy values ranged from 0·76 (0·74–0·79) to 0·97 (0·96–0·98). The model also showed strong performance on the TCGA dataset (473 patients; 535 slides; AUC values ranged from 0·82 [95% CI 0·71–0·93] to 0·96 [0·91–1·00], accuracy values ranged from 0·79 [0·70–0·88] to 0·95 [0·90–1·00]). The DeepGEM model, trained on primary region biopsy samples, could be generalised to biopsy samples from lymph node metastases, with AUC values of 0·91 (95% CI 0·88–0·94) for EGFR and 0·88 (0·82–0·93) for KRAS and accuracy values of 0·85 (0·80–0·88) for EGFR and 0·95 (0·92–0·96) for KRAS and showed potential for prognostic prediction of targeted therapy. The model generated spatial gene mutation maps, indicating gene mutation spatial distribution.

Interpretation We developed an AI-based method that can provide an accurate, timely, and economical prediction of gene mutation and mutation spatial distribution. The method showed substantial potential as an assistive tool for guiding the clinical treatment of patients with lung cancer.

LaDM3IL paper
A label disambiguation-based multimodal massive multiple instance learning approach for immune repertoire classification
Fan Xu*, Yu Zhao*, Bingzhe Wu, Yueshan Huang, Qin Ren, Yang Xiao, Bing He, Jie Zheng, Jianhua Yao
AAAI 2024

One individual human's immune repertoire consists of a huge set of adaptive immune receptors at a certain time point, representing the individual's adaptive immune state. Immune repertoire classification and associated receptor identification have the potential to make a transformative contribution to the development of novel vaccines and therapies. The vast number of instances and exceedingly low witness rate pose a great challenge to the immune repertoire classification, which can be formulated as a Massive Multiple Instance Learning (MMIL) problem. Traditional MIL methods, at both bag-level and instance-level, confront the issues of substantial computational burden or supervision ambiguity when handling massive instances. To address these issues, we propose a novel label disambiguation-based multimodal massive multiple instance learning approach (LaDM³IL) for immune repertoire classification. LaDM³IL adapts the instance-level MIL paradigm to deal with the issue of high computational cost and employs a specially-designed label disambiguation module for label correction, mitigating the impact of misleading supervision. To achieve a more comprehensive representation of each receptor, LaDM³IL leverages a multimodal fusion module with gating-based attention and tensor-fusion to integrate the information from gene segments and amino acid (AA) sequences of each immune receptor. Extensive experiments on the Cytomegalovirus (CMV) and Cancer datasets demonstrate the superior performance of the proposed LaDM³IL for both immune repertoire classification and associated receptor identification tasks. The code is publicly available at https://github.com/Josie-xufan/LaDM3IL.

IIB-MIL paper
IIB-MIL: Integrated instance-level and bag-level multiple instances learning with label disambiguation for pathological image analysis
Qin Ren*, Yu Zhao*, Bing He, Bingzhe Wu, Sijie Mai, Fan Xu, Yueshan Huang, Yonghong He, Junzhou Huang, Jianhua Yao
MICCAI 2023

Digital pathology plays a pivotal role in the diagnosis and interpretation of diseases and has drawn increasing attention in modern healthcare. Due to the huge gigapixel-level size and diverse nature of whole-slide images (WSIs), analyzing them through multiple instance learning (MIL) has become a widely-used scheme, which, however, faces the challenges that come with the weakly supervised nature of MIL. Conventional MIL methods mostly either utilized instance-level or bag-level supervision to learn informative representations from WSIs for downstream tasks. In this work, we propose a novel MIL method for pathological image analysis with integrated instance-level and bag-level supervision (termed IIB-MIL). More importantly, to overcome the weakly supervised nature of MIL, we design a label-disambiguation-based instance-level supervision for MIL using Prototypes and Confidence Bank to reduce the impact of noisy labels. Extensive experiments demonstrate that IIB-MIL outperforms state-of-the-art approaches in both benchmarking datasets and addressing the challenging practical clinical task. The code is available at https://github.com/TencentAILabHealthcare/IIB-MIL.

Multimodal-AIR-BERT paper
Multimodal-AIR-BERT: A Multimodal Pre-trained Model for Antigen Specificity Prediction in Adaptive Immune Receptors
Yang Xiao*, Yueshan Huang*, Yu Zhao*, Fan Xu, Qin Ren, Bing He, Jianhua Yao, Xiao Liu
BIBM 2023

The in silico prediction of antigen specificity in adaptive immune receptors (AIRs), such as T-cell receptors (TCRs), is essential for understanding immunological processes and developing targeted therapies. The V(D)J gene rearrangement is a critical biological process that generates diversity in amino acid (AA) sequences in antigen-binding regions, enabling AIRs to recognize a wide range of antigens from various pathogens and "altered self cells" observed in cancers. The huge diversity of AIRs presents a significant challenge to existing computational methods for antigen specificity prediction. To address these complexities, we introduce Multimodal-AIR-BERT, a novel multimodal pre-trained model aimed at enhancing the prediction of antigen-binding specificity in TCRs. It comprises a pre-trained sequence encoder, a gene encoder, and a multimodal fusion module with gating-based attention and tensor fusion to calibrate and integrate the V(D)J gene and AA sequence features of TCRs, thereby generating more informative representations. The integration of V(D)J gene information, which provides insights often unobtainable from sequences alone, benefits Multimodal-AIR-BERT in performance enhancement compared to its sequence-modality-only counterpart. Collectively, our work provides an advancement in the accurate prediction of antigen-binding specificity. As the precision of this specificity prediction improves, it can potentially pave the way for targeted immune therapies and deeper insights into the interactions within the immune system.

Projects
MMSelfSup project
MMSelfSup: OpenMMLab Self-Supervised Learning Toolbox and Benchmark
I contributed to this project as one of the main contributors during my internship at OpenMMLab.
Loading...
Professional Services Honors and Awards