Skip to content

NUTCH-3168 Sandbox Commons JEXL usage in crawl and index pipelines#909

Merged
lewismc merged 2 commits intoapache:masterfrom
lewismc:NUTCH-3168
Apr 14, 2026
Merged

NUTCH-3168 Sandbox Commons JEXL usage in crawl and index pipelines#909
lewismc merged 2 commits intoapache:masterfrom
lewismc:NUTCH-3168

Conversation

@lewismc
Copy link
Copy Markdown
Member

@lewismc lewismc commented Apr 13, 2026

PR for NUTCH-3168.
Some unit tests validate basic/common Jexl expressions and some edge cases.
I decided to implement a brand new configuration property nutch.jexl.disable.sandbox which allows users in trusted environments to bypass the sandbox. Another important design consideration was to maintain compatibility with ALL existing Jexl scripts users may have written prior to this patch.
Finally, Jexl expression handling is standardized across pipelines so behavior is predictable and auditable.

@lewismc lewismc self-assigned this Apr 13, 2026
Copy link
Copy Markdown
Contributor

@sebastian-nagel sebastian-nagel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Looks good. Thanks, @lewismc!

Tested the CrawlDbReader with a handful JEXL expressions to filter the dumped CrawlDb.

@lewismc lewismc merged commit df62fa1 into apache:master Apr 14, 2026
10 checks passed
@lewismc lewismc deleted the NUTCH-3168 branch April 14, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants