Update grid for CUDA by aguevara22 · Pull Request #233 · KindXiaoming/pykan

aguevara22 · 2024-05-23T13:53:21Z

Hi Ziming!

I noticed using "update_grid_from_samples", which really improves the training on CPU, leads to NaN in the network and gradient when used on CUDA. The problem seems to be coming from the fit of the B spline coefficients in spline.py:

coef = torch.linalg.lstsq(mat.to(device), y_eval.unsqueeze(dim=2).to(device),
driver='gelsy' if device == 'cpu' else 'gels').solution[:, :, 0]

Here "mat" is the B spline function which are not a full rank matrix depending on the samples. It seems that the driver 'gels' cannot handle degenerate matrices. So I just sent that operation to the CPU, which allows to use 'gelsy' and handle degenerate matrices. Perhaps there is a better solution but I'm committing it just in case, since it worked for me on both CUDA and MPS.

Update spline.py

e7987a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update grid for CUDA#233

Update grid for CUDA#233
aguevara22 wants to merge 1 commit into
KindXiaoming:masterfrom
aguevara22:Kolmorogov-Arnold-Networks---CUDA

aguevara22 commented May 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aguevara22 commented May 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant