mostly stylistic refactor + improved edge case handling for softmax, log_softmax#3328
mostly stylistic refactor + improved edge case handling for softmax, log_softmax#3328jachymb wants to merge 3 commits into
Conversation
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
| if (a.size() == 0) { | ||
| return a; | ||
| } | ||
| check_nonzero_size("softmax", "a", a); |
There was a problem hiding this comment.
So I think we actually want to do the opposite here. It is nice for stan programs if we can support size zero matrices / vectors so that parameters can be effectively turned off by making them size zero.
There was a problem hiding this comment.
I could do that, I just wanted a consistent behavior for softmax and log_softmax. But what do you think about the argument that in that case you have general sum(softmax(x))==1 except for the case that sum(softmax([])) == 0. Is that an acceptable behavior? I think it may in some cases prevent the user from "turning off the parameter by setting it to empty" in cofusing ways.
There was a problem hiding this comment.
To add to the discussion, please note that it really is necessary to implement this special case as a dedicated if guard, unlike other algorithms that work on empty list naturally. For softmax the expression v.maxCoeff() is undefined for []. For log_softmax, log_sum_exp([]) is undefined. This, in my opinion, shows some sort of "unnaturality" of defining it for [] which imho advises us against doing it, unless there's a strong practical reason to do so.
Summary
This a mostly stylistic refactor/cleanup of softmax function, in follow-up to #3313 (these changes should have been included there).
There was a bit of coding inconsistency about how softmax and log_softmax handle stuff and this cleans up the code to make them consistent and now they only differ in the computational implementation, not in templating and overload style. In particular, softmax is reimplemented in the more compact style of how log_softmax was implemented before.
There is additionally a small semantic change to the edge case where empty vector is passed as argument. Previously, log_softmax thrown invalid_argument (mistakenly documented as domain_error in doxygen) and softmax returned empty vector back. Now they both throw invalid_argument. My reasoning for that is that although one could argue the empty->empty case is mathematically okay-defined, I think specifically in terms of Stan programming where the output of softmax is typically interpreted as a probability distribution, but the empty vector doesn't sum to one, this is dubious. In other words, if you pass empty vector, you're likely doing something very questionable in the first place and you want to be warned instead of your input being silently swallowed.
Tests
Added a new tests for the edge case.
Side Effects
Minor change of behavior in the edge case.
Release notes
log_softmax and softmax now throw invalid_argument on empty input.
Checklist
[*] Copyright holder: Jáchym Barvínek jachymb@gmail.com
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
[*] the basic tests are passing
./runTests.py test/unit)make test-headers)make test-math-dependencies)make doxygen)make cpplint)[*] the code is written in idiomatic C++ and changes are documented in the doxygen
[*] the new changes are tested
** AI use disclosure
I used claude sonnet to help with this work and reviewed every single LOC myself.