Blog
244개 중 1-12번째 포스트
![[논문 리뷰] All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs](/assets/images/blog/20260514-paper-2605-12671-all-circuits-lead-to-rome-reth.jpg)
[논문 리뷰] All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs
In this paper, we present empirical and theoretical evidence against a central but largely implicit assumption in circuit and sheaf discovery (CSD), which we term the Functional Anisotropy Hypothesis:...
![[논문 리뷰] ELF: Embedded Language Flows](/assets/images/blog/20260514-paper-2605-10938-elf-embedded-language-flows.jpg)
[논문 리뷰] ELF: Embedded Language Flows
Diffusion and flow-based models have become the de facto approaches for generating continuous data, e.g., in domains such as images and videos. Their success has attracted growing interest in applying...
![[논문 리뷰] A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models](/assets/images/blog/20260514-paper-2605-08513-a-single-neuron-is-sufficient-.jpg)
[논문 리뷰] A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models
Safety alignment in language models operates through two mechanistically distinct systems: refusal neurons that gate whether harmful knowledge is expressed, and concept neurons that encode the harmful...
![[논문 리뷰] LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling](/assets/images/blog/20260514-paper-2605-08083-llms-improving-llms-agentic-di.jpg)
[논문 리뷰] LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are larg...
![[논문 리뷰] Complex-Valued Phase-Coherent Transformer](/assets/images/blog/20260513-paper-2605-10123-complex-valued-phase-coherent-.jpg)
[논문 리뷰] Complex-Valued Phase-Coherent Transformer
Complex-valued Transformers have largely inherited softmax attention from real-valued architectures. However, row-normalised token competition is not necessarily aligned with phase-preserving computat...
![[논문 리뷰] Fast Byte Latent Transformer](/assets/images/blog/20260513-paper-2605-08044-fast-byte-latent-transformer.jpg)
[논문 리뷰] Fast Byte Latent Transformer
Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generatio...
![[논문 리뷰] Temporal Straightening for Latent Planning](/assets/images/blog/20260511-paper-2603-12231-temporal-straightening-for-lat.jpg)
[논문 리뷰] Temporal Straightening for Latent Planning
Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contai...

자연어 처리를 위한 새로운 혁명: 대규모 언어 모델의 등장
최근 몇 년간 인공지능(AI) 분야에서 가장 주목할 만한 발전은 단연 대규모 언어 모델(Large Language Model, LLM)의 부상입니다. OpenAI의 GPT-4, Google의 Gemini, Meta의 Llama 3와 같은 모델들은 자연어 처리(Natural Language Processing, NLP) 분야에 혁신을 가져왔습니다. 이러한 모...
![[논문 리뷰] A Theory of Generalization in Deep Learning](/assets/images/blog/20260509-paper-2605-01172-a-theory-of-generalization-in-.jpg)
[논문 리뷰] A Theory of Generalization in Deep Learning
We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rap...
![[논문 리뷰] Temporal Straightening for Latent Planning](/assets/images/blog/20260509-paper-2603-12231-temporal-straightening-for-lat.jpg)
[논문 리뷰] Temporal Straightening for Latent Planning
Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contai...
![[논문 리뷰] From Context to Skills: Can Language Models Learn from Context Skillfully?](/assets/images/blog/20260507-paper-2604-27660-from-context-to-skills-can-lan.jpg)
[논문 리뷰] From Context to Skills: Can Language Models Learn from Context Skillfully?
Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge...
![[논문 리뷰] Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models](/assets/images/blog/20260506-paper-2504-20951-information-gravity-a-field-th.jpg)
[논문 리뷰] Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models
We propose a theoretical model called "information gravity" to describe the text generation process in large language models (LLMs). The model uses physical apparatus from field theory and spacetime g...
