Skip to content

[SPARK-56452][PYTHON] Fix pip install failure for prerelease versions#55311

Closed
kiyeonjeon21 wants to merge 3 commits intoapache:masterfrom
kiyeonjeon21:SPARK-56452
Closed

[SPARK-56452][PYTHON] Fix pip install failure for prerelease versions#55311
kiyeonjeon21 wants to merge 3 commits intoapache:masterfrom
kiyeonjeon21:SPARK-56452

Conversation

@kiyeonjeon21
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Updated the version validation regexes in python/pyspark/install.py to accept prerelease version strings (e.g. 4.2.0.dev4).

Two regexes were updated:

  • checked_versions() (line 71): the regex that detects bare version strings and prepends spark-
  • convert_old_hadoop_version() (line 109): the regex that extracts major/minor version parts

Both now include an optional (?:\.dev[0-9]+)? suffix.

Why are the changes needed?

When installing a PySpark prerelease via pip (e.g. pip install pyspark==4.2.0.dev4), the version string 4.2.0.dev4 does not match the existing regex ^[0-9]+\.[0-9]+\.[0-9]+$. This causes the spark- prefix to not be added, which then triggers a RuntimeError:

RuntimeError: Spark version should start with 'spark-' prefix; however, got 4.2.0.dev4

GitHub issue: #55289

Does this PR introduce any user-facing change?

Yes. Previously, pip install pyspark==4.2.0.dev4 would fail with a RuntimeError. After this fix, prerelease versions install correctly.

How was this patch tested?

Added two test cases in test_checked_versions for prerelease versions:

  • checked_versions("4.2.0.dev4", "3", "2.3") — bare version with dev suffix
  • checked_versions("spark-4.2.0.dev4", "hadoop3", "hive2.3") — already prefixed version with dev suffix

Was this patch authored or co-authored using generative AI tooling?

No.

The version validation regex in install.py rejects prerelease version
strings like "4.2.0.dev4" because it only matches X.Y.Z format. This
adds an optional ".devN" suffix to both regexes in checked_versions()
and convert_old_hadoop_version().
Copy link
Copy Markdown
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but let's make the CI green to make sure.

@gaogaotiantian
Copy link
Copy Markdown
Contributor

@HyukjinKwon do we plan to release any other pre-release versions in the future? For python semantics its dev -> a for alpha, b for beta, rc for release candidate.

@HyukjinKwon
Copy link
Copy Markdown
Member

maybe but we will likely stop preview releases for now cuz we're going to frequently release

@gaogaotiantian
Copy link
Copy Markdown
Contributor

Yeah I'm asking because if we plan to do other pre-releases in the future we might just solve all of them together in this PR. If we won't do that, we can just detect dev here.

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants