Blog
데이터 과학, 인공지능, 딥러닝에 관한 이야기
192개 중 1-12번째 포스트
![[논문 리뷰] RealWonder: Real-Time Physical Action-Conditioned Video Generation](/assets/images/blog/20260308-paper-2603-05449-realwonder-real-time-physical-.jpg)
[논문 리뷰] RealWonder: Real-Time Physical Action-Conditioned Video Generation
Current video generation models cannot simulate physical consequences of 3D actions like forces and robotic manipulations, as they lack structural understanding of how actions affect 3D scenes. We pre...
![[논문 리뷰] Helios: Real Real-Time Long Video Generation Model](/assets/images/blog/20260308-paper-2603-04379-helios-real-real-time-long-vid.jpg)
[논문 리뷰] Helios: Real Real-Time Long Video Generation Model
We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We mak...
![[논문 리뷰] Phi-4-reasoning-vision-15B Technical Report](/assets/images/blog/20260308-paper-2603-03975-phi-4-reasoning-vision-15b-tec.jpg)
[논문 리뷰] Phi-4-reasoning-vision-15B Technical Report
We present Phi-4-reasoning-vision-15B, a compact open-weight multimodal reasoning model, and share the motivations, design choices, experiments, and learnings that informed its development. Our goal i...
![[논문 리뷰] Beyond Language Modeling: An Exploration of Multimodal Pretraining](/assets/images/blog/20260308-paper-2603-03276-beyond-language-modeling-an-ex.jpg)
[논문 리뷰] Beyond Language Modeling: An Exploration of Multimodal Pretraining
The visual world offers a critical axis for advancing foundation models beyond language. Despite growing interest in this direction, the design space for native multimodal models remains opaque. We pr...
![[논문 리뷰] Chain of World: World Model Thinking in Latent Motion](/assets/images/blog/20260308-paper-2603-03195-chain-of-world-world-model-thi.jpg)
[논문 리뷰] Chain of World: World Model Thinking in Latent Motion
Vision-Language-Action (VLA) models are a promising path toward embodied intelligence, yet they often overlook the predictive and temporal-causal structure underlying visual dynamics. World-model VLAs...
![[논문 리뷰] Understanding LoRA as Knowledge Memory: An Empirical Analysis](/assets/images/blog/20260308-paper-2603-01097-understanding-lora-as-knowledg.jpg)
[논문 리뷰] Understanding LoRA as Knowledge Memory: An Empirical Analysis
Continuous knowledge updating for pre-trained large language models (LLMs) is increasingly necessary yet remains challenging. Although inference-time methods like In-Context Learning (ICL) and Retriev...
![[논문 리뷰] EvoSkill: Automated Skill Discovery for Multi-Agent Systems](/assets/images/blog/20260307-paper-2603-02766-evoskill-automated-skill-disco.jpg)
[논문 리뷰] EvoSkill: Automated Skill Discovery for Multi-Agent Systems
Coding agents are increasingly used as general-purpose problem solvers, but their flexibility does not by itself confer the domain expertise needed for specialized tasks. Recent work addresses this th...
![[논문 리뷰] Evaluating Theory of Mind and Internal Beliefs in LLM-Based Multi-Agent Systems](/assets/images/blog/20260307-paper-2603-00142-evaluating-theory-of-mind-and-.jpg)
[논문 리뷰] Evaluating Theory of Mind and Internal Beliefs in LLM-Based Multi-Agent Systems
LLM-based MAS are gaining popularity due to their potential for collaborative problem-solving enhanced by advances in natural language comprehension, reasoning, and planning. Research in Theory of Min...
![[논문 리뷰] ParamMem: Augmenting Language Agents with Parametric Reflective Memory](/assets/images/blog/20260307-paper-2602-23320-parammem-augmenting-language-a.jpg)
[논문 리뷰] ParamMem: Augmenting Language Agents with Parametric Reflective Memory
Self-reflection enables language agents to iteratively refine solutions, yet often produces repetitive outputs that limit reasoning performance. Recent studies have attempted to address this limitatio...
![[논문 리뷰] Large-scale online deanonymization with LLMs](/assets/images/blog/20260307-paper-2602-16800-large-scale-online-deanonymiza.jpg)
[논문 리뷰] Large-scale online deanonymization with LLMs
We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at hig...
![[논문 리뷰] From SGD to Spectra: A Theory of Neural Network Weight Dynamics](/assets/images/blog/20260307-paper-2507-12709-from-sgd-to-spectra-a-theory-o.jpg)
[논문 리뷰] From SGD to Spectra: A Theory of Neural Network Weight Dynamics
Deep neural networks have revolutionized machine learning, yet their training dynamics remain theoretically unclear-we develop a continuous-time, matrix-valued stochastic differential equation (SDE) f...
![[논문 리뷰] ParamMem: Augmenting Language Agents with Parametric Reflective Memory](/assets/images/blog/20260305-paper-2602-23320-parammem-augmenting-language-a.jpg)
[논문 리뷰] ParamMem: Augmenting Language Agents with Parametric Reflective Memory
Self-reflection enables language agents to iteratively refine solutions, yet often produces repetitive outputs that limit reasoning performance. Recent studies have attempted to address this limitatio...
