<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>LLM | Ben Benhemo</title><link>https://benbenhemo.com/tag/llm/</link><atom:link href="https://benbenhemo.com/tag/llm/index.xml" rel="self" type="application/rss+xml"/><description>LLM</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sat, 04 Oct 2025 00:00:00 +0000</lastBuildDate><image><url>https://benbenhemo.com/media/icon_hu9e1d2b86e2bb2877819b4fa069da1ee7_107810_512x512_fill_lanczos_center_3.png</url><title>LLM</title><link>https://benbenhemo.com/tag/llm/</link></image><item><title>Lightweight Reachability System: GitLab Knowledge Graph + AI Agents</title><link>https://benbenhemo.com/post/lightweight-reachability/</link><pubDate>Sat, 04 Oct 2025 00:00:00 +0000</pubDate><guid>https://benbenhemo.com/post/lightweight-reachability/</guid><description>&lt;h2 id="the-problem-not-every-sca-vulnerability-is-exploitable">The Problem: Not Every SCA Vulnerability is Exploitable&lt;/h2>
&lt;p>When &lt;a href="https://nvd.nist.gov/vuln/detail/CVE-2025-29927" target="_blank" rel="noopener">CVE-2025-29927&lt;/a> dropped for the Next.js library, many appsec teams worldwide immediately started working on a fix.
Dependency scanners generated a vast amount of security findings across companies codebases that had the vulnerable version of the package.&lt;/p>
&lt;p>The reason many false positive alerts were generated was due to a technical aspect that most traditional dependency scanners miss: &lt;strong>Vulnerability != Exploitability&lt;/strong>.&lt;/p>
&lt;p>The Next.js CVE perfectly illustrates this gap. From an exploitability perspective this vulnerability is only relevant where:&lt;/p>
&lt;ul>
&lt;li>Self-hosted deployments using &lt;code>next start&lt;/code> with &lt;code>output: standalone&lt;/code>&lt;/li>
&lt;li>Middleware file &lt;code>(middleware.js/middleware.ts)&lt;/code> exists and is actively used&lt;/li>
&lt;li>Middleware performs critical security operations like authentication or authorization checks&lt;/li>
&lt;/ul>
&lt;p>Yet every traditional dependency scanner will flag any application using affected Next.js versions, regardless of deployment context or middleware usage. This creates a massive false positive problem that burns security &amp;amp; engineering hours and creates alert fatigue.&lt;/p>
&lt;h2 id="why-reachability-analysis-matters">Why Reachability Analysis Matters&lt;/h2>
&lt;p>Traditional SCA (Software Composition Analysis) tools operate on a simple binary: They extract the SBOM ➡️ Compare versions to a vulnerability database ➡️ Flag any match as risk.&lt;/p>
&lt;p>This simplistic approach results in a high rate of false positives, as it doesn&amp;rsquo;t consider whether the vulnerable code is actually reachable or exploitable in your specific context.
Because real exploitation requires:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Vulnerable code must be imported&lt;/strong> - Is the vulnerable function/module actually imported?&lt;/li>
&lt;li>&lt;strong>Reachable execution path&lt;/strong> - Does the call graph show that application flow can reach the vulnerable code?&lt;/li>
&lt;li>&lt;strong>Client controlled input&lt;/strong> - Can external input influence the vulnerable code path?&lt;/li>
&lt;/ol>
&lt;h2 id="gitlabs-knowledge-graph-tool-overview">GitLab&amp;rsquo;s Knowledge Graph: Tool Overview&lt;/h2>
&lt;p>GitLab recently released a &lt;a href="https://gitlab-org.gitlab.io/rust/knowledge-graph/" target="_blank" rel="noopener">Knowledge Graph tool&lt;/a> that transforms your codebase into a queryable graph database. It understands:&lt;/p>
&lt;ul>
&lt;li>Function call hierarchies&lt;/li>
&lt;li>Import dependencies&lt;/li>
&lt;li>Variable flow&lt;/li>
&lt;li>Cross-file relationships&lt;/li>
&lt;/ul>
&lt;img src="knowledge-graph.jpg" alt="GitLab Knowledge Graph" style="width: 100%; max-width: 100%; height: auto;" />
&lt;p>The special thing about this tool is that it can be easily integrated into your AI agents &amp;amp; LLMs by providing a local MCP setup, which helps you query your codebase for code insights.&lt;/p>
&lt;div class="flex px-4 py-3 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-400">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">While this tool is primarily designed to assist engineers during development, I realized its capabilities could be redirected to help AppSec engineers investigate SCA vulnerabilities.&lt;/span>
&lt;/div>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe src="https://www.youtube.com/embed/wL6-m5_2FH8" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video">&lt;/iframe>
&lt;/div>
&lt;h2 id="quick-example-cve-2024-47081-analysis">Quick Example: CVE-2024-47081 Analysis&lt;/h2>
&lt;p>To demonstrate this approach I analyzed a sample repository for SCA findings. From the initial traditional scan I identified the requests library at version 2.32.3 which has CVE-2024-47081: a credential leakage vulnerability in Python&amp;rsquo;s requests library.
This vulnerability allows .netrc credential leakage when processing malicious URLs, but our analysis revealed:&lt;/p>
&lt;ul>
&lt;li>3 files using the requests library: github_collector.py, npm_collector.py, and smithery_collector.py&lt;/li>
&lt;li>All requests calls use hardcoded URL templates with validated string formatting&lt;/li>
&lt;li>No arbitrary URL injection vectors found in the codebase&lt;/li>
&lt;/ul>
&lt;p>This analysis took &lt;strong>&amp;lt; 30 seconds&lt;/strong> using GitLab&amp;rsquo;s Knowledge Graph + Claude, compared to a long time of manual code review. The system correctly identified that while the vulnerable package was present, the specific conditions for exploitation were not met.&lt;/p>
&lt;img src="screenshot-requests.jpg" alt="CVE Analysis Screenshot" style="width: 100%; max-width: 100%; height: auto;" />
&lt;h2 id="automating-the-process">Automating the Process&lt;/h2>
&lt;p>Taking this a step further, we can essentially automate the process to help with remediations by creating a job that takes the SCA findings from our traditional scanners and uses the knowledge graph along with an AI agent to review them. It will automatically generate a reachability report for each repository.&lt;/p>
&lt;h3 id="architecture-overview">Architecture Overview&lt;/h3>
&lt;div class="mermaid">graph LR
A[SCA Scanner Findings] --> B[LLM + Knowledge Graph]
B --> C[Call Graph Analysis]
C --> D[Reachability Report]
E[Your Codebase] --> B
&lt;/div>
&lt;h3 id="example-the-reachability-analysis-prompt">Example: The Reachability Analysis Prompt&lt;/h3>
&lt;p>Here&amp;rsquo;s the prompt template that powers the analysis:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-markdown" data-lang="markdown">&lt;span class="line">&lt;span class="cl">You are a security researcher analyzing CVE reachability.
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Given CVE: [CVE-ID]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Vulnerable component: [package@version]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Vulnerability details: [description]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Target Project: [project]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Using the GitLab Knowledge Graph, determine:
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">1.&lt;/span> Is the vulnerable package imported anywhere?
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">2.&lt;/span> What are the call paths to vulnerable functions?
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">3.&lt;/span> Are these paths reachable from external entry points?
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">4.&lt;/span> What conditions must be met for exploitation?
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Provide a reachability verdict: REACHABLE | UNREACHABLE
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="beyond-cves-advanced-security-applications">Beyond CVEs: Advanced Security Applications&lt;/h2>
&lt;p>The knowledge graph isn&amp;rsquo;t just for investigating CVEs, it opens up entirely new possibilities for answering complex security questions that traditional SAST and SCA tools can&amp;rsquo;t address. For example:&lt;/p>
&lt;h3 id="finding-authentication-bypasses">Finding Authentication Bypasses&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">&amp;#34;Show me all code flows that reach database queries without passing through an authentication check&amp;#34;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="tracking-pii-flow">Tracking PII Flow&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">&amp;#34;Trace all paths where PII data flows from API input to external services&amp;#34;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="detecting-ssrf-patterns">Detecting SSRF Patterns&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">&amp;#34;Find all places where user input can influence URL parameters in HTTP requests&amp;#34;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>The combination of GitLab&amp;rsquo;s Knowledge Graph and AI agents enables us to perform deep level analysis of SCA vulnerabilities in a budget friendly manner, using this open source tool combined with our AI agents. This approach not only saves time and reduces alert fatigue, but it also opens up new possibilities for automated security analysis without the need for expensive commercial tools.&lt;/strong>&lt;/p>
&lt;hr>
&lt;small>
&lt;p>&lt;strong>References:&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>CVE-2025-29927 - Next.js Authorization Bypass Vulnerability | &lt;a href="https://nvd.nist.gov/vuln/detail/CVE-2025-29927" target="_blank" rel="noopener">National Vulnerability Database&lt;/a>&lt;/li>
&lt;li>CVE-2025-29927 Deep Dive | &lt;a href="https://jfrog.com/blog/cve-2025-29927-next-js-authorization-bypass/" target="_blank" rel="noopener">JFrog Security Research&lt;/a>&lt;/li>
&lt;li>GitLab Knowledge Graph MCP Server | &lt;a href="https://gitlab-org.gitlab.io/rust/knowledge-graph/" target="_blank" rel="noopener">Official Documentation&lt;/a>&lt;/li>
&lt;li>CVE-2024-47081 - Python Requests Credential Leakage | &lt;a href="https://github.com/advisories/GHSA-9wx4-h78v-vm56" target="_blank" rel="noopener">GitHub Security Advisory&lt;/a>&lt;/li>
&lt;/ol>
&lt;/small></description></item></channel></rss>