Blog
데이터 과학, 인공지능, 딥러닝에 관한 이야기
202개 중 1-12번째 포스트
![[논문 리뷰] LLM2Vec-Gen: Generative Embeddings from Large Language Models](/assets/images/blog/20260315-paper-2603-10913-llm2vec-gen-generative-embeddi.jpg)
[논문 리뷰] LLM2Vec-Gen: Generative Embeddings from Large Language Models
LLM-based text embedders typically encode the semantic content of their input. However, embedding tasks require mapping diverse inputs to similar outputs. Typically, this input-output is addressed by ...
![[논문 리뷰] OpenClaw-RL: Train Any Agent Simply by Talking](/assets/images/blog/20260315-paper-2603-10165-openclaw-rl-train-any-agent-si.jpg)
[논문 리뷰] OpenClaw-RL: Train Any Agent Simply by Talking
Every agent interaction generates a next-state signal, namely the user reply, tool output, terminal or GUI state change that follows each action, yet no existing agentic RL system recovers it as a liv...
![[논문 리뷰] LLM2Vec-Gen: Generative Embeddings from Large Language Models](/assets/images/blog/20260314-paper-2603-10913-llm2vec-gen-generative-embeddi.jpg)
[논문 리뷰] LLM2Vec-Gen: Generative Embeddings from Large Language Models
LLM-based text embedders typically encode the semantic content of their input. However, embedding tasks require mapping diverse inputs to similar outputs. Typically, this input-output is addressed by ...
![[논문 리뷰] Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization](/assets/images/blog/20260312-paper-2603-10808-nurture-first-agent-developmen.jpg)
[논문 리뷰] Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization
The emergence of large language model (LLM)-based agent frameworks has shifted the primary challenge in building domain-expert AI agents from raw capability to effective encoding of domain expertise. ...
![[논문 리뷰] Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)](/assets/images/blog/20260312-paper-2510-22954-artificial-hivemind-the-open-e.jpg)
[논문 리뷰] Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Language models (LMs) often struggle to generate diverse, human-like creative content, raising concerns about the long-term homogenization of human thought through repeated exposure to similar outputs...
![[논문 리뷰] Cache Mechanism for Agent RAG Systems](/assets/images/blog/20260310-paper-2511-02919-cache-mechanism-for-agent-rag-.jpg)
[논문 리뷰] Cache Mechanism for Agent RAG Systems
Recent advances in Large Language Model (LLM)-based agents have been propelled by Retrieval-Augmented Generation (RAG), which grants the models access to vast external knowledge bases. Despite RAG's s...
![[논문 리뷰] Why Is Anything Conscious?](/assets/images/blog/20260310-paper-2409-14545-why-is-anything-conscious.jpg)
[논문 리뷰] Why Is Anything Conscious?
We tackle the problem of consciousness by taking the naturally selected, embodied organism as our starting point. We provide a formalism describing how biological systems such as human bodies self-org...
![[논문 리뷰] RealWonder: Real-Time Physical Action-Conditioned Video Generation](/assets/images/blog/20260308-paper-2603-05449-realwonder-real-time-physical-.jpg)
[논문 리뷰] RealWonder: Real-Time Physical Action-Conditioned Video Generation
Current video generation models cannot simulate physical consequences of 3D actions like forces and robotic manipulations, as they lack structural understanding of how actions affect 3D scenes. We pre...
![[논문 리뷰] Helios: Real Real-Time Long Video Generation Model](/assets/images/blog/20260308-paper-2603-04379-helios-real-real-time-long-vid.jpg)
[논문 리뷰] Helios: Real Real-Time Long Video Generation Model
We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We mak...
![[논문 리뷰] Phi-4-reasoning-vision-15B Technical Report](/assets/images/blog/20260308-paper-2603-03975-phi-4-reasoning-vision-15b-tec.jpg)
[논문 리뷰] Phi-4-reasoning-vision-15B Technical Report
We present Phi-4-reasoning-vision-15B, a compact open-weight multimodal reasoning model, and share the motivations, design choices, experiments, and learnings that informed its development. Our goal i...
![[논문 리뷰] Beyond Language Modeling: An Exploration of Multimodal Pretraining](/assets/images/blog/20260308-paper-2603-03276-beyond-language-modeling-an-ex.jpg)
[논문 리뷰] Beyond Language Modeling: An Exploration of Multimodal Pretraining
The visual world offers a critical axis for advancing foundation models beyond language. Despite growing interest in this direction, the design space for native multimodal models remains opaque. We pr...
![[논문 리뷰] Chain of World: World Model Thinking in Latent Motion](/assets/images/blog/20260308-paper-2603-03195-chain-of-world-world-model-thi.jpg)
[논문 리뷰] Chain of World: World Model Thinking in Latent Motion
Vision-Language-Action (VLA) models are a promising path toward embodied intelligence, yet they often overlook the predictive and temporal-causal structure underlying visual dynamics. World-model VLAs...
