-
Notifications
You must be signed in to change notification settings - Fork 18.6k
llama + spec: MTP Support #22673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
llama + spec: MTP Support #22673
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
9b996f0
spec: support MTP
am17an 80e1f3c
fix batch size
am17an 8d16341
rename files
am17an 5e1965d
cont : simplify (#7)
ggerganov 89f6e0d
MTP: clean-up (#9)
am17an 7ea1289
mtp -> draft-mtp
am17an 9243e50
remove unused llama_arch
am17an 23ae80a
add need_embd in speculative
am17an a5b3e98
llama: allow partial seq_rm for GDN models for speculative decoding
am17an 3aa9ddc
fix pending state
am17an 2ef737a
vulkan: add GDN partial rollback
am17an d7443da
meta: extend check to axis 1
am17an 19be81c
metal: add GDN partial rollback
ggerganov d0759f0
delta_net_base: use ggml_pad instead of new_tensor
am17an 78a78ae
review: add need_rs_seq
am17an 611f422
review: rename part_bounded to n_rs
am17an df4cd32
review: deslop comments
am17an 9674711
review: rename, add asserts
am17an 7b54ac5
server : adjust checkpoint logic (#11)
ggerganov 749a0b2
server-context: fix early exit
am17an d42d25d
spec : fix compatibility with n-gram and add TODOs (#13)
ggerganov cddbb7f
llama-memory: enable checkpointing with partial rollback
am17an 6ef79f7
cont: add test-case for loading into a dirty ctx
am17an 0f6f0d6
llama-memory-recurrent: clear rs_idx in clear
am17an 37a479f
download: fix mtp path
am17an 8e9a07d
llama-arch: fix enorm op
am17an 5a818cd
docs: update docs
am17an 2dff7ff
conversion: fix type annotations
am17an File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO @ngxson make the function to accept
optsas argument