Created concepts for samplers, added quotient_and_pdf variants to satisfy the concepts#1001
Created concepts for samplers, added quotient_and_pdf variants to satisfy the concepts#1001devshgraphicsprogramming merged 61 commits intomasterfrom
Conversation
…isfy the concepts
…concepts - Move codomain_and_*Pdf and domain_and_*Pdf structs into their own warp_and_pdf.hlsl header - Keeping quotient_and_pdf.hlsl focused on importance sampling quotients for BxDFs - Add SampleWithPDF, SampleWithRcpPDF, and SampleWithDensity concepts to validate sample types - Used concept composition (NBL_CONCEPT_REQ_TYPE_ALIAS_CONCEPT) to build ResamplableSampler on TractableSampler and BijectiveSampler on ResamplableSampler
… projected/spherical triangle
# Conflicts: # examples_tests
| // the accumulated cosine slightly outside [-1,1] on GPU, making acos | ||
| // return NaN. GPU max(NaN,0)=0 then silently zeroes the solid angle. | ||
| retval.solid_angle = hlsl::max(angle_adder.getClampedSumOfArccosMinusPi(), scalar_type(0.0)); | ||
|
|
There was a problem hiding this comment.
I accept this, but why are you maxing something thats already clamped ? Why not clamp with the proper thing? I'm pretty sure compiler will have hard time optimizing the max on top of clamp
Fruthermore if you are clamping the sum, is there a point to clamping cos_vertices ?
There was a problem hiding this comment.
Replaced max(..., 0) with clamp(..., 0, 2*pi), getClampedSumOfArccosMinusPi() only clamps the acos input not the result
| // cos_a - cos_b * cos_c - (1/sin_b * 1/sin_c) | ||
| retval.cos_vertices = hlsl::clamp((retval.cos_sides - retval.cos_sides.yzx * retval.cos_sides.zxy) * retval.csc_sides.yzx * retval.csc_sides.zxy, hlsl::promote<vector3_type>(-1.0), hlsl::promote<vector3_type>(1.0)); // using Spherical Law of Cosines | ||
| retval.sin_vertices = hlsl::sqrt(hlsl::promote<vector3_type>(1.0) - retval.cos_vertices * retval.cos_vertices); |
There was a problem hiding this comment.
who will be hurt (who's not already clamping) if cos_vertices is unclamped?
its simpler to clamp a square when computing sin_vertices = 1.0-cos*cos because a square is always positive, so you only need a min not a clamp
| retval.triCosC = tri.cos_sides[2]; | ||
| // precompute great circle normal of arc AC: cross(A,C) has magnitude sin(b), | ||
| // so multiplying by csc(b) normalizes it; zero when side AC is degenerate | ||
| const scalar_type cscB = tri.csc_sides[1]; | ||
| const vector3_type arcACPlaneNormal = hlsl::cross(tri.vertices[0], tri.vertices[2]) * hlsl::select(cscB < numeric_limits<scalar_type>::max, cscB, scalar_type(0)); | ||
| retval.e_C = hlsl::cross(arcACPlaneNormal, tri.vertices[0]); | ||
| retval.cosA = tri.cos_vertices[0]; | ||
| retval.sinA = tri.sin_vertices[0]; | ||
| if (Algorithm == STA_ARVO) | ||
| { | ||
| retval.sinA_triCosC = retval.sinA * retval.triCosC; | ||
| retval.eCdotB = hlsl::dot(retval.e_C, tri.vertices[1]); | ||
| } | ||
| return retval; | ||
| } | ||
|
|
||
| codomain_type generate(const domain_type u, NBL_REF_ARG(cache_type) cache) NBL_CONST_MEMBER_FUNC | ||
| { | ||
| // Step 1: compute sin/cos of A_hat and the angle difference (A_hat - alpha) | ||
| const scalar_type A_hat = u.x / rcpSolidAngle; | ||
| scalar_type sinA_hat, cosA_hat; | ||
| math::sincos(A_hat, sinA_hat, cosA_hat); | ||
| const scalar_type s = sinA_hat * cosA - cosA_hat * sinA; // sin(A_hat - alpha) | ||
| const scalar_type t = cosA_hat * cosA + sinA_hat * sinA; // cos(A_hat - alpha) |
| cosBp = nbl::hlsl::clamp(cosBp, scalar_type(-1), scalar_type(1)); | ||
| sinBp = sqrt<scalar_type>(max<scalar_type>(scalar_type(0), scalar_type(1) - cosBp * cosBp)); |
There was a problem hiding this comment.
cosBp is already clamped, IEEE754 makes it impossible for 1-cosBp*cosBp to be outside of [-1,1]
| if (triCscB < numeric_limits<scalar_type>::max) | ||
| { | ||
| const scalar_type cosAngleAlongAC = ((v_ * q - u_ * p) * cosA - v_) / ((v_ * p + u_ * q) * sinA); | ||
| if (nbl::hlsl::abs(cosAngleAlongAC) < 1.f) | ||
| C_s += math::quaternion<scalar_type>::slerp_delta(tri_vertices[0], tri_vertices[2] * triCscB, cosAngleAlongAC); | ||
| } | ||
|
|
||
| vector3_type retval = tri_vertices[1]; | ||
| const scalar_type cosBC_s = nbl::hlsl::dot(C_s, tri_vertices[1]); | ||
| const scalar_type csc_b_s = 1.0 / nbl::hlsl::sqrt(1.0 - cosBC_s * cosBC_s); | ||
| if (csc_b_s < numeric_limits<scalar_type>::max) | ||
| { | ||
| const scalar_type cosAngleAlongBC_s = scalar_type(1.0) + cosBC_s * u.y - u.y; | ||
| if (nbl::hlsl::abs(cosAngleAlongBC_s) < scalar_type(1.0)) | ||
| retval += math::quaternion<scalar_type>::slerp_delta(tri_vertices[1], C_s * csc_b_s, cosAngleAlongBC_s); | ||
| } |
There was a problem hiding this comment.
how fast would this have been if you replaced the if statements with hlsl::select and have no branching plus inversesqrt and the fixed sincos ?
There was a problem hiding this comment.
esp given that bits of slerp_delta could be precomputed in create #1001 (comment)
| // mirrored from base for uniform access across both specializations | ||
| scalar_type rcpSolidAngle; |
There was a problem hiding this comment.
but base already keeps it ?
There was a problem hiding this comment.
used a method isntead of a direct member poking to provide access to outsiders #1054
There was a problem hiding this comment.
Resolved by Matt
| const scalar_type rcpR = scalar_type(1.0) / nbl::hlsl::sqrt(R_sq); | ||
| vector3_type cp = base.tri_vertices[0] * (tripleE * rcpR) + base.e_C * (-tripleA * rcpR); | ||
| // two intersections exist; pick the one on the minor arc A->C | ||
| if (nbl::hlsl::dot(cp, base.tri_vertices[0] + vertexC) < scalar_type(0.0)) |
There was a problem hiding this comment.
store base.tri_vertices[0] + vertexC instead of vertexC
Also why the if-statement you could just do cp *= sign(dot()); or cp *= ieee754:::... if you don't want to mul by 0 when dot is 0
| domain_type generateInverse(const codomain_type L) NBL_CONST_MEMBER_FUNC | ||
| { | ||
| // Step 1: find C' = intersection of great circles (B,L) and (A,C) | ||
| const vector3_type BxL = nbl::hlsl::cross(base.tri_vertices[1], L); | ||
| const scalar_type sinBL_sq = nbl::hlsl::dot(BxL, BxL); | ||
| if (sinBL_sq < numeric_limits<scalar_type>::epsilon) | ||
| { | ||
| // L ~ B: u.y ~ 0, u.x is indeterminate (all u.x map to B when u.y=0). | ||
| // Recover u.y from |L-B|^2 / |A-B|^2 (using C'=A; the (1-cosCpB) ratio | ||
| // cancels so any C' gives the same result). | ||
| const vector3_type LminusB = L - base.tri_vertices[1]; | ||
| const vector3_type AminusB = base.tri_vertices[0] - base.tri_vertices[1]; | ||
| const scalar_type v_num = nbl::hlsl::dot(LminusB, LminusB); | ||
| const scalar_type v_denom = nbl::hlsl::dot(AminusB, AminusB); | ||
| const scalar_type v = hlsl::select(v_denom > numeric_limits<scalar_type>::epsilon, | ||
| nbl::hlsl::clamp(v_num / v_denom, scalar_type(0.0), scalar_type(1.0)), | ||
| scalar_type(0.0)); | ||
| return vector2_type(scalar_type(0.0), v); | ||
| } | ||
|
|
||
| // C' lies on arc AC, so C' = A*cos(t) + e_C*sin(t). | ||
| // C' also lies on the B-L plane, so dot(BxL, C') = 0. | ||
| // Solving: (cos(t), sin(t)) = (tripleE, -tripleA) / R | ||
| const scalar_type tripleA = nbl::hlsl::dot(BxL, base.tri_vertices[0]); | ||
| const scalar_type tripleE = nbl::hlsl::dot(BxL, base.e_C); | ||
| const scalar_type R_sq = tripleA * tripleA + tripleE * tripleE; | ||
| if (R_sq < numeric_limits<scalar_type>::epsilon) | ||
| return vector2_type(scalar_type(0.0), scalar_type(0.0)); | ||
|
|
||
| const scalar_type rcpR = scalar_type(1.0) / nbl::hlsl::sqrt(R_sq); | ||
| vector3_type cp = base.tri_vertices[0] * (tripleE * rcpR) + base.e_C * (-tripleA * rcpR); | ||
| // two intersections exist; pick the one on the minor arc A->C | ||
| if (nbl::hlsl::dot(cp, base.tri_vertices[0] + vertexC) < scalar_type(0.0)) | ||
| cp = -cp; | ||
|
|
||
| // Step 2: u.x = solidAngle(A,B,C') / solidAngle(A,B,C) | ||
| // Van Oosterom-Strackee: tan(Omega/2) = |A.(BxC')| / (1 + A.B + B.C' + C'.A) | ||
| // | ||
| // Numerator stability: the naive triple product dot(A, cross(B, C')) suffers | ||
| // catastrophic cancellation when C' is near A (small u.x), because | ||
| // cross(B, C') ~ cross(B, A) and dot(A, cross(B, A)) = 0 exactly. | ||
| // Expanding C' = cosBp*A + sinBp*e_C into the triple product: | ||
| // A.(BxC') = cosBp * A.(BxA) + sinBp * A.(Bxe_C) = sinBp * A.(Bxe_C) | ||
| // since A.(BxA) = 0 identically. This avoids the cancellation. | ||
| const scalar_type cosBp_inv = nbl::hlsl::dot(cp, base.tri_vertices[0]); | ||
| const scalar_type sinBp_inv = nbl::hlsl::dot(cp, base.e_C); | ||
| const scalar_type AxBdotE = nbl::hlsl::dot(base.tri_vertices[0], nbl::hlsl::cross(base.tri_vertices[1], base.e_C)); | ||
| const scalar_type num = sinBp_inv * AxBdotE; | ||
| const scalar_type cosCpB = nbl::hlsl::dot(base.tri_vertices[1], cp); | ||
| const scalar_type den = scalar_type(1.0) + base.triCosC + cosCpB + cosBp_inv; | ||
| const scalar_type subSolidAngle = scalar_type(2.0) * nbl::hlsl::atan2(nbl::hlsl::abs(num), den); | ||
| const scalar_type u = nbl::hlsl::clamp(subSolidAngle * rcpSolidAngle, scalar_type(0.0), scalar_type(1.0)); | ||
|
|
||
| // Step 3: u.y = |L-B|^2 / |C'-B|^2 | ||
| // Squared Euclidean distance avoids catastrophic cancellation vs (1-dot)/(1-dot) | ||
| const vector3_type LminusB = L - base.tri_vertices[1]; | ||
| const vector3_type cpMinusB = cp - base.tri_vertices[1]; | ||
| const scalar_type v_num = nbl::hlsl::dot(LminusB, LminusB); | ||
| const scalar_type v_denom = nbl::hlsl::dot(cpMinusB, cpMinusB); | ||
| const scalar_type v = hlsl::select(v_denom > numeric_limits<scalar_type>::epsilon, | ||
| nbl::hlsl::clamp(v_num / nbl::hlsl::max(v_denom, numeric_limits<scalar_type>::min), | ||
| scalar_type(0.0), scalar_type(1.0)), | ||
| scalar_type(0.0)); | ||
|
|
||
| return vector2_type(u, v); | ||
| } |
There was a problem hiding this comment.
this does not look faster than my old impl
There was a problem hiding this comment.
not a big deal if fake MIS weights are good enough
| @@ -87,7 +104,7 @@ struct SphericalTriangle | |||
| // See https://www.desmos.com/calculator/sdptomhbju | |||
| // Furthermore we could clip the polynomial calc to `Cu+D or `(Bu+C)u+D` for small arguments | |||
| const vector3_type pyramidAngles = hlsl::acos<vector3_type>(cos_sides); | |||
There was a problem hiding this comment.
btw it would make more sense to keep acos_csc_approx instead of cos_sides as a member
There was a problem hiding this comment.
or to ALSO keep it
There was a problem hiding this comment.
yeah should be precomputed and stored, please keep as a member (I guess we can eval projected solid angle for the same observer using many normals - mixture of BRDFs?)
| // r_disk / r_xy = sqrt(1-z) / sqrt(1-z^2) = 1/sqrt(1+z) | ||
| const T scale = T(1.0) / hlsl::sqrt<T>(T(1.0) + v.z); |
| const T z = T(1.0) - cmCache.r2; | ||
| const T xyScale = hlsl::sqrt<T>(hlsl::max<T>(T(0.0), T(2.0) - cmCache.r2)); |
There was a problem hiding this comment.
assert r2<=1.f then you don't need to clamp before the sqrt
does this pass any tests? why is it 2-r^2 !?
There was a problem hiding this comment.
shouldn't this just be a sqrt(r2) ?
There was a problem hiding this comment.
The disk point p has magnitude r, and we need |output_xy| = sqrt(1 - z^2) = sqrt(r^2(2 - r^2)).
Since output_xy = p * xyScale and |p| = r, the scale has to be sqrt(r^2(2-r^2)) / r = sqrt(2 - r^2).
There was a problem hiding this comment.
oooh, thanks for explaining
There was a problem hiding this comment.
Remove the max then, the r2 will never be above 1, so the sqrt argument will never be below 0.5
| { | ||
| ProjectedSphericalRectangle<T,UsePdfAsWeight> retval; | ||
| const vector3_type n = hlsl::mul(shape.basis, _receiverNormal); | ||
| retval.localReceiverNormal = n; |
There was a problem hiding this comment.
alternative is to produce 4 normalized corners in worldspace from a compressed rectangle
origin
origin+right
origin+up
origin+right+up
then 4 normalizations
here you're doing 4 full 3d normalizations with your lenSq, you're only saving a few MULs
the dot products are less expensive because its not 2 FMA and 1 MUL done 4 times over.
but the matrix mul to transform the receiver normal into rectangle local space is equivalent to 3 dots.
Add a create that takes a CompressedSphericalRectangle it might end up being faster because you're not using results of operations, there's a possibility of deeper pipelining/less stalls or different register usage
There was a problem hiding this comment.
left a TODO in the code on #1054

Examples PR
Notes:
quotient_and_pdf()methods inUniformHemisphere,UniformSphere,ProjectedHemisphere, andProjectedSphereshadow the struct typesampling::quotient_and_pdf<Q, P>fromquotient_and_pdf.hlsl. DXC can't resolve the return type because the method name takes precedence over the struct name during lookup. Fixed by fully qualifying with::nbl::hlsl::sampling::quotient_and_pdf<U, T>.