Skip to content

Update grid for CUDA#233

Open
aguevara22 wants to merge 1 commit into
KindXiaoming:masterfrom
aguevara22:Kolmorogov-Arnold-Networks---CUDA
Open

Update grid for CUDA#233
aguevara22 wants to merge 1 commit into
KindXiaoming:masterfrom
aguevara22:Kolmorogov-Arnold-Networks---CUDA

Conversation

@aguevara22

Copy link
Copy Markdown

Hi Ziming!

I noticed using "update_grid_from_samples", which really improves the training on CPU, leads to NaN in the network and gradient when used on CUDA. The problem seems to be coming from the fit of the B spline coefficients in spline.py:

coef = torch.linalg.lstsq(mat.to(device), y_eval.unsqueeze(dim=2).to(device),
driver='gelsy' if device == 'cpu' else 'gels').solution[:, :, 0]

Here "mat" is the B spline function which are not a full rank matrix depending on the samples. It seems that the driver 'gels' cannot handle degenerate matrices. So I just sent that operation to the CPU, which allows to use 'gelsy' and handle degenerate matrices. Perhaps there is a better solution but I'm committing it just in case, since it worked for me on both CUDA and MPS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant