Skip to content

fix: support new webpack chunk format for ondemand.s lookup#416

Open
steverex169 wants to merge 1 commit intod60:mainfrom
steverex169:fix/x-client-transaction-webpack-format
Open

fix: support new webpack chunk format for ondemand.s lookup#416
steverex169 wants to merge 1 commit intod60:mainfrom
steverex169:fix/x-client-transaction-webpack-format

Conversation

@steverex169
Copy link
Copy Markdown

@steverex169 steverex169 commented Apr 9, 2026

Problem

Twitter changed the structure of x.com's HTML, breaking the ClientTransaction.get_indices() method for all users.

The old format was:

'ondemand.s': 'abc123'

The current format uses a webpack chunk map split across two objects:

// name map
{20113:"ondemand.s", ...}
// hash map (separate)
{20113:"2c5bb94", ...}

The existing ON_DEMAND_FILE_REGEX no longer matches, causing this error on every single API call:

Exception: Couldn't get KEY_BYTE indices

Fix

  • Added CHUNK_NAME_REGEX to extract the chunk ID from the name map
  • Falls back to resolving the hash from the separate hash map using that chunk ID
  • Old format still works (tried first), so this is fully backwards compatible

Testing

Verified locally against live x.com — list scraping, tweet fetching, and search all work correctly after the fix.

Summary by Sourcery

Bug Fixes:

  • Fix failure to locate ondemand.s script hash after Twitter changed the webpack chunking scheme, restoring successful key byte index extraction for API calls.

Summary by CodeRabbit

  • Bug Fixes
    • Improved handling of updated asset manifests so on-demand resources load reliably across old and new backend formats, reducing failed asset fetches and page load errors for end users.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Apr 9, 2026

Reviewer's Guide

Updates ondemand JavaScript asset discovery to handle Twitter/X's new webpack chunk mapping format while keeping backward compatibility with the previous inline hash format.

Class diagram for ClientTransaction get_indices hash resolution

classDiagram
    class ClientTransaction {
        +home_page_response
        +get_indices(home_page_response, session, headers)
    }

    class RegexUtilities {
        +ON_DEMAND_FILE_REGEX
        +CHUNK_NAME_REGEX
        +INDICES_REGEX
    }

    ClientTransaction ..> RegexUtilities : uses
Loading

Flow diagram for ondemand.s hash resolution in get_indices

flowchart TD
    A["Start get_indices"] --> B["Validate response and select home_page_response"]
    B --> C["Convert response to string response_str"]
    C --> D["Search response_str with ON_DEMAND_FILE_REGEX"]
    D --> E{Old format match?}

    E -->|Yes| F["Extract file_hash from on_demand_file.group(1)"]
    F --> M["Build ondemand.s URL with file_hash"]

    E -->|No| G["Search response_str with CHUNK_NAME_REGEX"]
    G --> H{Chunk ID match?}

    H -->|No| L["file_hash remains None"]
    H -->|Yes| I["Extract chunk_id from chunk_id_match.group(1)"]
    I --> J["Compile hash_pattern using chunk_id"]
    J --> K["Iterate all hash_pattern matches in response_str"]
    K --> N{Valid hash candidate?}

    N -->|Yes| O["Set file_hash to candidate value"]
    N -->|No| P["Continue iterating matches"]
    P --> K

    L --> Q{file_hash is set?}
    O --> Q
    F --> Q

    Q -->|No| R["Abort: cannot resolve ondemand.s hash"]
    Q -->|Yes| M

    M --> S["GET ondemand.s file via session.request"]
    S --> T["Extract key_byte_indices with INDICES_REGEX"]
    T --> U["Return key_byte_indices"]
Loading

File-Level Changes

Change Details Files
Extend ondemand.s asset lookup to support new webpack chunk ID + hash map format while preserving support for the old inline hash format.
  • Add CHUNK_NAME_REGEX to detect the ondemand.s chunk ID from the webpack name map in the HTML response.
  • Refactor get_indices to stringify the home page response once and attempt the legacy ON_DEMAND_FILE_REGEX match first.
  • When legacy lookup fails, search for the ondemand.s chunk ID using CHUNK_NAME_REGEX, then locate the associated hash via a dynamically constructed regex over the separate hash map.
  • Filter candidate hash matches to ignore the literal 'ondemand' value and constrain hash length to a reasonable maximum before selecting the file hash.
  • Build the ondemand.s asset URL using the resolved file_hash (from either path) and proceed with existing logic to fetch the script and extract key byte indices.
twikit/x_client_transaction/transaction.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a369603-ad69-421f-b5fe-82b7254e399d

📥 Commits

Reviewing files that changed from the base of the PR and between f367875 and a7099a7.

📒 Files selected for processing (1)
  • twikit/x_client_transaction/transaction.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • twikit/x_client_transaction/transaction.py

📝 Walkthrough

Walkthrough

Updated webpack manifest parsing in the transaction module to handle a new response format. Added a chunk-name regex and fallback logic in get_indices() to derive file_hash either from a direct "ondemand.s": "hash" entry or indirectly via a chunk ID that maps to "ondemand.s" and then to the hash.

Changes

Cohort / File(s) Summary
Webpack Manifest Format Adaptation
twikit/x_client_transaction/transaction.py
Added CHUNK_NAME_REGEX and changed get_indices() to serialize the validated home page response once, attempt legacy 'ondemand.s': '<hash>' lookup, and fall back to finding a chunk ID that references "ondemand.s" and then resolving its hash. Adjusted URL construction and conditional fetching to only request ondemand.s after file_hash is resolved; extraction of key_byte_indices now depends on the resolved file_hash.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Chunks and hashes, what a prance,
A manifest shifted, so I dance,
I hunt the chunk, then chase the hash,
From old to new I make a dash,
Small regex hops — success at last! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding support for a new webpack chunk format in the ondemand.s lookup mechanism.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Twitter changed the x.com HTML structure from the old format:
  'ondemand.s': 'hash'
to a new webpack chunk map format:
  chunk_id:"ondemand.s"  (name map)
  chunk_id:"hash"        (separate hash map)

The old ON_DEMAND_FILE_REGEX no longer matches, causing
"Couldn't get KEY_BYTE indices" on every API call.

This fix detects both formats: tries the old regex first,
then falls back to extracting the chunk ID from the name map
and resolving its hash from the separate hash map.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@steverex169 steverex169 force-pushed the fix/x-client-transaction-webpack-format branch from f367875 to a7099a7 Compare April 9, 2026 14:09
Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The new chunk/hash extraction logic relies on a fairly loose hash_pattern that will match any {chunk_id:"..."} pair; consider tightening this (e.g., via surrounding context or restricting the object scope) to reduce the risk of accidentally picking up unrelated values.
  • The heuristic val != 'ondemand' and len(val) <= 12 is somewhat opaque and fragile; extracting these constants into named variables or adding a small helper with a descriptive name would make the intent and constraints clearer and easier to adjust when Twitter changes formats again.
  • The code currently recompiles hash_pattern on every call to get_indices; if this pattern is stable, precompiling it (or using a function that builds it once per chunk_id) would avoid repeated compilation and make the code more consistent with the other module-level regexes.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new chunk/hash extraction logic relies on a fairly loose `hash_pattern` that will match any `{chunk_id:"..."}` pair; consider tightening this (e.g., via surrounding context or restricting the object scope) to reduce the risk of accidentally picking up unrelated values.
- The heuristic `val != 'ondemand' and len(val) <= 12` is somewhat opaque and fragile; extracting these constants into named variables or adding a small helper with a descriptive name would make the intent and constraints clearer and easier to adjust when Twitter changes formats again.
- The code currently recompiles `hash_pattern` on every call to `get_indices`; if this pattern is stable, precompiling it (or using a function that builds it once per `chunk_id`) would avoid repeated compilation and make the code more consistent with the other module-level regexes.

## Individual Comments

### Comment 1
<location path="twikit/x_client_transaction/transaction.py" line_range="59-64" />
<code_context>
+            if chunk_id_match:
+                chunk_id = chunk_id_match.group(1)
+                hash_pattern = re.compile(rf'{chunk_id}:"([\w]+)"')
+                all_matches = list(hash_pattern.finditer(response_str))
+                file_hash = None
+                for m in all_matches:
+                    val = m.group(1)
+                    if val != 'ondemand' and len(val) <= 12:
+                        file_hash = val
+                        break
+            else:
</code_context>
<issue_to_address>
**suggestion (performance):** Collecting all matches into a list is unnecessary and slightly wasteful for large responses.

Because you only need the first matching `val` that satisfies `val != 'ondemand' and len(val) <= 12`, you can iterate directly over `hash_pattern.finditer(response_str)` and break on the first suitable match instead of building `all_matches` as a list. This avoids the intermediate list and reduces work/memory usage for large `response_str` values.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +59 to +64
all_matches = list(hash_pattern.finditer(response_str))
file_hash = None
for m in all_matches:
val = m.group(1)
if val != 'ondemand' and len(val) <= 12:
file_hash = val
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Collecting all matches into a list is unnecessary and slightly wasteful for large responses.

Because you only need the first matching val that satisfies val != 'ondemand' and len(val) <= 12, you can iterate directly over hash_pattern.finditer(response_str) and break on the first suitable match instead of building all_matches as a list. This avoids the intermediate list and reduces work/memory usage for large response_str values.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@twikit/x_client_transaction/transaction.py`:
- Line 18: CHUNK_NAME_REGEX is too strict and only matches unquoted, no-space
forms like 20113:"ondemand.s"; update CHUNK_NAME_REGEX to allow optional
single/double quotes around the numeric key and the value and permit arbitrary
spacing around the colon (e.g. use a pattern like
r'["\']?(\d+)["\']?\s*:\s*["\']?ondemand\.s["\']?' as the new regex), and apply
the same tolerant regex update to the other similar regexes/usages referenced
around lines 55-58 so all key/value formatting variants (quoted keys, spaces)
are matched.
- Around line 61-64: The loop that assigns file_hash is rejecting candidates by
a hard-coded length check ("len(val) <= 12"), which can drop valid webpack chunk
hashes; remove that arbitrary constraint in the block that iterates over
all_matches (the for m in all_matches loop) and instead accept any
non-'ondemand' match (val != 'ondemand') or replace the check with a proper
validation (e.g., match against a hex/base62 regex or a configurable
max_hash_length) before assigning file_hash; update references to file_hash
accordingly so downstream logic performs definitive validation rather than
relying on the 12-character heuristic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1aafa861-b8d8-4e35-8df6-540f0cdd32d9

📥 Commits

Reviewing files that changed from the base of the PR and between c3b7220 and f367875.

📒 Files selected for processing (1)
  • twikit/x_client_transaction/transaction.py

ON_DEMAND_FILE_REGEX = re.compile(
r"""['|\"]{1}ondemand\.s['|\"]{1}:\s*['|\"]{1}([\w]*)['|\"]{1}""", flags=(re.VERBOSE | re.MULTILINE))
# New webpack format: chunk ID maps to name, separate hash map
CHUNK_NAME_REGEX = re.compile(r'(\d+):"ondemand\.s"')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make chunk-ID regex tolerant to key/value formatting variants.

The current pattern only matches 20113:"ondemand.s" exactly. If the runtime emits quoted keys or spacing (e.g., "20113": "ondemand.s"), this will fail and break index resolution again.

Proposed robust pattern update
-CHUNK_NAME_REGEX = re.compile(r'(\d+):"ondemand\.s"')
+CHUNK_NAME_REGEX = re.compile(
+    r"""['"]?(\d+)['"]?\s*:\s*['"]ondemand\.s['"]"""
+)
...
-                hash_pattern = re.compile(rf'{chunk_id}:"([\w]+)"')
+                hash_pattern = re.compile(
+                    rf"""['"]?{re.escape(chunk_id)}['"]?\s*:\s*['"]([A-Za-z0-9]+)['"]"""
+                )

Also applies to: 55-58

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@twikit/x_client_transaction/transaction.py` at line 18, CHUNK_NAME_REGEX is
too strict and only matches unquoted, no-space forms like 20113:"ondemand.s";
update CHUNK_NAME_REGEX to allow optional single/double quotes around the
numeric key and the value and permit arbitrary spacing around the colon (e.g.
use a pattern like r'["\']?(\d+)["\']?\s*:\s*["\']?ondemand\.s["\']?' as the new
regex), and apply the same tolerant regex update to the other similar
regexes/usages referenced around lines 55-58 so all key/value formatting
variants (quoted keys, spaces) are matched.

Comment on lines +61 to +64
for m in all_matches:
val = m.group(1)
if val != 'ondemand' and len(val) <= 12:
file_hash = val
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid hard-coding max hash length (<= 12) for candidate selection.

Webpack chunk hashes are not guaranteed to stay at or below 12 chars. This heuristic can silently reject valid hashes and reintroduce the "Couldn't get KEY_BYTE indices" failure.

Safer candidate filter
-                    if val != 'ondemand' and len(val) <= 12:
+                    # prefer hex-like hash candidates; tolerate future length changes
+                    if re.fullmatch(r"[0-9a-fA-F]{6,64}", val):
                         file_hash = val
                         break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@twikit/x_client_transaction/transaction.py` around lines 61 - 64, The loop
that assigns file_hash is rejecting candidates by a hard-coded length check
("len(val) <= 12"), which can drop valid webpack chunk hashes; remove that
arbitrary constraint in the block that iterates over all_matches (the for m in
all_matches loop) and instead accept any non-'ondemand' match (val !=
'ondemand') or replace the check with a proper validation (e.g., match against a
hex/base62 regex or a configurable max_hash_length) before assigning file_hash;
update references to file_hash accordingly so downstream logic performs
definitive validation rather than relying on the 12-character heuristic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant