본문으로 건너뛰기
SuanLab

Blog

244개 중 1-12번째 포스트

[논문 리뷰] All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs
2026-05-147Paper Review

[논문 리뷰] All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs

In this paper, we present empirical and theoretical evidence against a central but largely implicit assumption in circuit and sheaf discovery (CSD), which we term the Functional Anisotropy Hypothesis:...

Paper Review
cs.CL
cs.CL
[논문 리뷰] ELF: Embedded Language Flows
2026-05-147Paper Review

[논문 리뷰] ELF: Embedded Language Flows

Diffusion and flow-based models have become the de facto approaches for generating continuous data, e.g., in domains such as images and videos. Their success has attracted growing interest in applying...

Paper Review
cs.CL
cs.AI
+1
[논문 리뷰] A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models
2026-05-147Paper Review

[논문 리뷰] A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models

Safety alignment in language models operates through two mechanistically distinct systems: refusal neurons that gate whether harmful knowledge is expressed, and concept neurons that encode the harmful...

Paper Review
cs.CL
cs.AI
+1
[논문 리뷰] LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
2026-05-149Paper Review

[논문 리뷰] LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are larg...

Paper Review
cs.CL
cs.CL
[논문 리뷰] Complex-Valued Phase-Coherent Transformer
2026-05-138Paper Review

[논문 리뷰] Complex-Valued Phase-Coherent Transformer

Complex-valued Transformers have largely inherited softmax attention from real-valued architectures. However, row-normalised token competition is not necessarily aligned with phase-preserving computat...

Paper Review
cs.LG
cs.LG
[논문 리뷰] Fast Byte Latent Transformer
2026-05-137Paper Review

[논문 리뷰] Fast Byte Latent Transformer

Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generatio...

Paper Review
cs.CL
cs.AI
+1
[논문 리뷰] Temporal Straightening for Latent Planning
2026-05-118Paper Review

[논문 리뷰] Temporal Straightening for Latent Planning

Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contai...

Paper Review
cs.LG
cs.LG
자연어 처리를 위한 새로운 혁명: 대규모 언어 모델의 등장
2026-05-096General

자연어 처리를 위한 새로운 혁명: 대규모 언어 모델의 등장

최근 몇 년간 인공지능(AI) 분야에서 가장 주목할 만한 발전은 단연 대규모 언어 모델(Large Language Model, LLM)의 부상입니다. OpenAI의 GPT-4, Google의 Gemini, Meta의 Llama 3와 같은 모델들은 자연어 처리(Natural Language Processing, NLP) 분야에 혁신을 가져왔습니다. 이러한 모...

[논문 리뷰] A Theory of Generalization in Deep Learning
2026-05-0914Paper Review

[논문 리뷰] A Theory of Generalization in Deep Learning

We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rap...

Paper Review
cs.LG
stat.ML
+1
[논문 리뷰] Temporal Straightening for Latent Planning
2026-05-098Paper Review

[논문 리뷰] Temporal Straightening for Latent Planning

Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contai...

Paper Review
cs.LG
cs.LG
[논문 리뷰] From Context to Skills: Can Language Models Learn from Context Skillfully?
2026-05-077Paper Review

[논문 리뷰] From Context to Skills: Can Language Models Learn from Context Skillfully?

Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge...

Paper Review
cs.AI
cs.AI
[논문 리뷰] Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models
2026-05-069Paper Review

[논문 리뷰] Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models

We propose a theoretical model called "information gravity" to describe the text generation process in large language models (LLMs). The model uses physical apparatus from field theory and spacetime g...

Paper Review
cs.CL
cs.CL
...