Ryujinx/Ryujinx.Graphics.Gpu/Engine/Threed/StateUpdater.cs

1381 lines
54 KiB
C#
Raw Normal View History

using Ryujinx.Common.Logging;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
using Ryujinx.Common.Memory;
using Ryujinx.Graphics.GAL;
using Ryujinx.Graphics.Gpu.Engine.GPFifo;
using Ryujinx.Graphics.Gpu.Engine.Types;
using Ryujinx.Graphics.Gpu.Image;
using Ryujinx.Graphics.Gpu.Shader;
using Ryujinx.Graphics.Shader;
using Ryujinx.Graphics.Texture;
using System;
using System.Runtime.CompilerServices;
namespace Ryujinx.Graphics.Gpu.Engine.Threed
{
/// <summary>
/// GPU state updater.
/// </summary>
class StateUpdater
{
public const int ShaderStateIndex = 26;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
public const int RasterizerStateIndex = 15;
public const int ScissorStateIndex = 16;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
public const int VertexBufferStateIndex = 0;
public const int PrimitiveRestartStateIndex = 12;
private readonly GpuContext _context;
private readonly GpuChannel _channel;
private readonly DeviceStateWithShadow<ThreedClassState> _state;
private readonly DrawState _drawState;
private readonly StateUpdateTracker<ThreedClassState> _updateTracker;
private readonly ShaderProgramInfo[] _currentProgramInfo;
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
private ShaderSpecializationState _shaderSpecState;
private SpecializationStateUpdater _currentSpecState;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
private ProgramPipelineState _pipeline;
private bool _vsUsesDrawParameters;
private bool _vtgWritesRtLayer;
private byte _vsClipDistancesWritten;
private uint _vbEnableMask;
private bool _prevDrawIndexed;
private bool _prevDrawIndirect;
private IndexType _prevIndexType;
private uint _prevFirstVertex;
private bool _prevTfEnable;
private uint _prevRtNoAlphaMask;
/// <summary>
/// Creates a new instance of the state updater.
/// </summary>
/// <param name="context">GPU context</param>
/// <param name="channel">GPU channel</param>
/// <param name="state">3D engine state</param>
/// <param name="drawState">Draw state</param>
/// <param name="spec">Specialization state updater</param>
public StateUpdater(GpuContext context, GpuChannel channel, DeviceStateWithShadow<ThreedClassState> state, DrawState drawState, SpecializationStateUpdater spec)
{
_context = context;
_channel = channel;
_state = state;
_drawState = drawState;
_currentProgramInfo = new ShaderProgramInfo[Constants.ShaderStages];
_currentSpecState = spec;
// ShaderState must be updated after other state updates, as specialization/pipeline state is used when fetching shaders.
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
// Render target state must appear after shader state as it depends on information from the currently bound shader.
// Rasterizer and scissor states are checked by render target clear, their indexes
// must be updated on the constants "RasterizerStateIndex" and "ScissorStateIndex" if modified.
// The vertex buffer state may be forced dirty when a indexed draw starts, the "VertexBufferStateIndex"
// constant must be updated if modified.
// The order of the other state updates doesn't matter.
_updateTracker = new StateUpdateTracker<ThreedClassState>(new[]
{
new StateUpdateCallbackEntry(UpdateVertexBufferState,
nameof(ThreedClassState.VertexBufferDrawState),
nameof(ThreedClassState.VertexBufferInstanced),
nameof(ThreedClassState.VertexBufferState),
nameof(ThreedClassState.VertexBufferEndAddress)),
// Must be done after vertex buffer updates.
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
new StateUpdateCallbackEntry(UpdateVertexAttribState, nameof(ThreedClassState.VertexAttribState)),
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
new StateUpdateCallbackEntry(UpdateBlendState,
nameof(ThreedClassState.BlendIndependent),
nameof(ThreedClassState.BlendConstant),
nameof(ThreedClassState.BlendStateCommon),
nameof(ThreedClassState.BlendEnableCommon),
nameof(ThreedClassState.BlendEnable),
nameof(ThreedClassState.BlendState)),
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
new StateUpdateCallbackEntry(UpdateFaceState, nameof(ThreedClassState.FaceState)),
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
new StateUpdateCallbackEntry(UpdateStencilTestState,
nameof(ThreedClassState.StencilBackMasks),
nameof(ThreedClassState.StencilTestState),
nameof(ThreedClassState.StencilBackTestState)),
new StateUpdateCallbackEntry(UpdateDepthTestState,
nameof(ThreedClassState.DepthTestEnable),
nameof(ThreedClassState.DepthWriteEnable),
nameof(ThreedClassState.DepthTestFunc)),
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
new StateUpdateCallbackEntry(UpdateTessellationState,
nameof(ThreedClassState.TessMode),
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
nameof(ThreedClassState.TessOuterLevel),
nameof(ThreedClassState.TessInnerLevel),
nameof(ThreedClassState.PatchVertices)),
new StateUpdateCallbackEntry(UpdateViewportTransform,
nameof(ThreedClassState.DepthMode),
nameof(ThreedClassState.ViewportTransform),
nameof(ThreedClassState.ViewportExtents),
nameof(ThreedClassState.YControl),
nameof(ThreedClassState.ViewportTransformEnable)),
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
new StateUpdateCallbackEntry(UpdateLogicOpState, nameof(ThreedClassState.LogicOpState)),
new StateUpdateCallbackEntry(UpdateDepthClampState, nameof(ThreedClassState.ViewVolumeClipControl)),
new StateUpdateCallbackEntry(UpdatePolygonMode,
nameof(ThreedClassState.PolygonModeFront),
nameof(ThreedClassState.PolygonModeBack)),
new StateUpdateCallbackEntry(UpdateDepthBiasState,
nameof(ThreedClassState.DepthBiasState),
nameof(ThreedClassState.DepthBiasFactor),
nameof(ThreedClassState.DepthBiasUnits),
nameof(ThreedClassState.DepthBiasClamp)),
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
new StateUpdateCallbackEntry(UpdatePrimitiveRestartState, nameof(ThreedClassState.PrimitiveRestartState)),
new StateUpdateCallbackEntry(UpdateLineState,
nameof(ThreedClassState.LineWidthSmooth),
nameof(ThreedClassState.LineSmoothEnable)),
new StateUpdateCallbackEntry(UpdateRtColorMask,
nameof(ThreedClassState.RtColorMaskShared),
nameof(ThreedClassState.RtColorMask)),
new StateUpdateCallbackEntry(UpdateRasterizerState, nameof(ThreedClassState.RasterizeEnable)),
new StateUpdateCallbackEntry(UpdateScissorState,
nameof(ThreedClassState.ScissorState),
nameof(ThreedClassState.ScreenScissorState)),
new StateUpdateCallbackEntry(UpdateTfBufferState, nameof(ThreedClassState.TfBufferState)),
new StateUpdateCallbackEntry(UpdateUserClipState, nameof(ThreedClassState.ClipDistanceEnable)),
new StateUpdateCallbackEntry(UpdateAlphaTestState,
nameof(ThreedClassState.AlphaTestEnable),
nameof(ThreedClassState.AlphaTestRef),
nameof(ThreedClassState.AlphaTestFunc)),
new StateUpdateCallbackEntry(UpdateSamplerPoolState,
nameof(ThreedClassState.SamplerPoolState),
nameof(ThreedClassState.SamplerIndex)),
new StateUpdateCallbackEntry(UpdateTexturePoolState, nameof(ThreedClassState.TexturePoolState)),
new StateUpdateCallbackEntry(UpdatePointState,
nameof(ThreedClassState.PointSize),
nameof(ThreedClassState.VertexProgramPointSize),
nameof(ThreedClassState.PointSpriteEnable),
nameof(ThreedClassState.PointCoordReplace)),
new StateUpdateCallbackEntry(UpdateIndexBufferState,
nameof(ThreedClassState.IndexBufferState),
nameof(ThreedClassState.IndexBufferCount)),
new StateUpdateCallbackEntry(UpdateMultisampleState,
nameof(ThreedClassState.AlphaToCoverageDitherEnable),
nameof(ThreedClassState.MultisampleControl)),
new StateUpdateCallbackEntry(UpdateEarlyZState,
nameof(ThreedClassState.EarlyZForce)),
new StateUpdateCallbackEntry(UpdateShaderState,
nameof(ThreedClassState.ShaderBaseAddress),
nameof(ThreedClassState.ShaderState)),
new StateUpdateCallbackEntry(UpdateRenderTargetState,
nameof(ThreedClassState.RtColorState),
nameof(ThreedClassState.RtDepthStencilState),
nameof(ThreedClassState.RtControl),
nameof(ThreedClassState.RtDepthStencilSize),
nameof(ThreedClassState.RtDepthStencilEnable)),
});
}
/// <summary>
/// Sets a register at a specific offset as dirty.
/// This must be called if the register value was modified.
/// </summary>
/// <param name="offset">Register offset</param>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void SetDirty(int offset)
{
_updateTracker.SetDirty(offset);
}
/// <summary>
/// Force all the guest state to be marked as dirty.
/// The next call to <see cref="Update"/> will update all the host state.
/// </summary>
public void SetAllDirty()
{
_updateTracker.SetAllDirty();
}
/// <summary>
/// Updates host state for any modified guest state, since the last time this function was called.
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Update()
{
// The vertex buffer size is calculated using a different
// method when doing indexed draws, so we need to make sure
// to update the vertex buffers if we are doing a regular
// draw after a indexed one and vice-versa.
if (_drawState.DrawIndexed != _prevDrawIndexed)
{
_updateTracker.ForceDirty(VertexBufferStateIndex);
// If PrimitiveRestartDrawArrays is false and this is a non-indexed draw, we need to ensure primitive restart is disabled.
// If PrimitiveRestartDrawArrays is false and this is a indexed draw, we need to ensure primitive restart enable matches GPU state.
// If PrimitiveRestartDrawArrays is true, then primitive restart enable should always match GPU state.
// That is because "PrimitiveRestartDrawArrays" is not configurable on the backend, it is always
// true on OpenGL and always false on Vulkan.
if (!_state.State.PrimitiveRestartDrawArrays && _state.State.PrimitiveRestartState.Enable)
{
_updateTracker.ForceDirty(PrimitiveRestartStateIndex);
}
_prevDrawIndexed = _drawState.DrawIndexed;
}
// Some draw parameters are used to restrict the vertex buffer size,
// but they can't be used on indirect draws because their values are unknown in this case.
// When switching between indirect and non-indirect draw, we need to
// make sure the vertex buffer sizes are still correct.
if (_drawState.DrawIndirect != _prevDrawIndirect)
{
_updateTracker.ForceDirty(VertexBufferStateIndex);
}
// In some cases, the index type is also used to guess the
// vertex buffer size, so we must update it if the type changed too.
if (_drawState.DrawIndexed &&
(_prevIndexType != _state.State.IndexBufferState.Type ||
_prevFirstVertex != _state.State.FirstVertex))
{
_updateTracker.ForceDirty(VertexBufferStateIndex);
_prevIndexType = _state.State.IndexBufferState.Type;
_prevFirstVertex = _state.State.FirstVertex;
}
bool tfEnable = _state.State.TfEnable;
if (!tfEnable && _prevTfEnable)
{
_context.Renderer.Pipeline.EndTransformFeedback();
_prevTfEnable = false;
}
_updateTracker.Update(ulong.MaxValue);
// If any state that the shader depends on changed,
// then we may need to compile/bind a different version
// of the shader for the new state.
if (_shaderSpecState != null && _currentSpecState.HasChanged())
{
if (!_shaderSpecState.MatchesGraphics(_channel, ref _currentSpecState.GetPoolState(), ref _currentSpecState.GetGraphicsState(), _vsUsesDrawParameters, false))
{
// Shader must be reloaded. _vtgWritesRtLayer should not change.
UpdateShaderState();
}
}
CommitBindings();
if (tfEnable && !_prevTfEnable)
{
_context.Renderer.Pipeline.BeginTransformFeedback(_drawState.Topology);
_prevTfEnable = true;
}
}
/// <summary>
/// Updates the host state for any modified guest state group with the respective bit set on <paramref name="mask"/>.
/// </summary>
/// <param name="mask">Mask, where each bit set corresponds to a group index that should be checked and updated</param>
public void Update(ulong mask)
{
_updateTracker.Update(mask);
}
/// <summary>
/// Ensures that the bindings are visible to the host GPU.
/// Note: this actually performs the binding using the host graphics API.
/// </summary>
private void CommitBindings()
{
UpdateStorageBuffers();
bool unalignedChanged = _currentSpecState.SetHasUnalignedStorageBuffer(_channel.BufferManager.HasUnalignedStorageBuffers);
if (!_channel.TextureManager.CommitGraphicsBindings(_shaderSpecState) || unalignedChanged)
{
// Shader must be reloaded. _vtgWritesRtLayer should not change.
UpdateShaderState();
}
_channel.BufferManager.CommitGraphicsBindings();
}
/// <summary>
/// Updates storage buffer bindings.
/// </summary>
private void UpdateStorageBuffers()
{
for (int stage = 0; stage < Constants.ShaderStages; stage++)
{
ShaderProgramInfo info = _currentProgramInfo[stage];
if (info == null)
{
continue;
}
for (int index = 0; index < info.SBuffers.Count; index++)
{
BufferDescriptor sb = info.SBuffers[index];
ulong sbDescAddress = _channel.BufferManager.GetGraphicsUniformBufferAddress(stage, 0);
int sbDescOffset = 0x110 + stage * 0x100 + sb.Slot * 0x10;
sbDescAddress += (ulong)sbDescOffset;
SbDescriptor sbDescriptor = _channel.MemoryManager.Physical.Read<SbDescriptor>(sbDescAddress);
_channel.BufferManager.SetGraphicsStorageBuffer(stage, sb.Slot, sbDescriptor.PackAddress(), (uint)sbDescriptor.Size, sb.Flags);
}
}
}
/// <summary>
/// Updates tessellation state based on the guest GPU state.
/// </summary>
private void UpdateTessellationState()
{
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_pipeline.PatchControlPoints = (uint)_state.State.PatchVertices;
_context.Renderer.Pipeline.SetPatchParameters(
_state.State.PatchVertices,
2022-08-11 14:07:37 -07:00
_state.State.TessOuterLevel.AsSpan(),
_state.State.TessInnerLevel.AsSpan());
_currentSpecState.SetTessellationMode(_state.State.TessMode);
}
/// <summary>
/// Updates transform feedback buffer state based on the guest GPU state.
/// </summary>
private void UpdateTfBufferState()
{
for (int index = 0; index < Constants.TotalTransformFeedbackBuffers; index++)
{
TfBufferState tfb = _state.State.TfBufferState[index];
if (!tfb.Enable)
{
_channel.BufferManager.SetTransformFeedbackBuffer(index, 0, 0);
continue;
}
_channel.BufferManager.SetTransformFeedbackBuffer(index, tfb.Address.Pack(), (uint)tfb.Size);
}
}
/// <summary>
/// Updates Rasterizer primitive discard state based on guest gpu state.
/// </summary>
private void UpdateRasterizerState()
{
bool enable = _state.State.RasterizeEnable;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_pipeline.RasterizerDiscard = !enable;
_context.Renderer.Pipeline.SetRasterizerDiscard(!enable);
}
/// <summary>
/// Updates render targets (color and depth-stencil buffers) based on current render target state.
/// </summary>
private void UpdateRenderTargetState()
{
UpdateRenderTargetState(true);
}
/// <summary>
/// Updates render targets (color and depth-stencil buffers) based on current render target state.
/// </summary>
/// <param name="useControl">Use draw buffers information from render target control register</param>
/// <param name="layered">Indicates if the texture is layered</param>
/// <param name="singleUse">If this is not -1, it indicates that only the given indexed target will be used.</param>
public void UpdateRenderTargetState(bool useControl, bool layered = false, int singleUse = -1)
{
var memoryManager = _channel.MemoryManager;
var rtControl = _state.State.RtControl;
int count = useControl ? rtControl.UnpackCount() : Constants.TotalRenderTargets;
var msaaMode = _state.State.RtMsaaMode;
int samplesInX = msaaMode.SamplesInX();
int samplesInY = msaaMode.SamplesInY();
var scissor = _state.State.ScreenScissorState;
Size sizeHint = new Size(scissor.X + scissor.Width, scissor.Y + scissor.Height, 1);
int clipRegionWidth = int.MaxValue;
int clipRegionHeight = int.MaxValue;
bool changedScale = false;
uint rtNoAlphaMask = 0;
for (int index = 0; index < Constants.TotalRenderTargets; index++)
{
int rtIndex = useControl ? rtControl.UnpackPermutationIndex(index) : index;
var colorState = _state.State.RtColorState[rtIndex];
if (index >= count || !IsRtEnabled(colorState))
{
changedScale |= _channel.TextureManager.SetRenderTargetColor(index, null);
continue;
}
if (colorState.Format.NoAlpha())
{
rtNoAlphaMask |= 1u << index;
}
Image.Texture color = memoryManager.Physical.TextureCache.FindOrCreateTexture(
memoryManager,
colorState,
_vtgWritesRtLayer || layered,
samplesInX,
samplesInY,
sizeHint);
changedScale |= _channel.TextureManager.SetRenderTargetColor(index, color);
if (color != null)
{
if (clipRegionWidth > color.Width / samplesInX)
{
clipRegionWidth = color.Width / samplesInX;
}
if (clipRegionHeight > color.Height / samplesInY)
{
clipRegionHeight = color.Height / samplesInY;
}
}
}
bool dsEnable = _state.State.RtDepthStencilEnable;
Image.Texture depthStencil = null;
if (dsEnable)
{
var dsState = _state.State.RtDepthStencilState;
var dsSize = _state.State.RtDepthStencilSize;
depthStencil = memoryManager.Physical.TextureCache.FindOrCreateTexture(
memoryManager,
dsState,
dsSize,
_vtgWritesRtLayer || layered,
samplesInX,
samplesInY,
sizeHint);
if (depthStencil != null)
{
if (clipRegionWidth > depthStencil.Width / samplesInX)
{
clipRegionWidth = depthStencil.Width / samplesInX;
}
if (clipRegionHeight > depthStencil.Height / samplesInY)
{
clipRegionHeight = depthStencil.Height / samplesInY;
}
}
}
changedScale |= _channel.TextureManager.SetRenderTargetDepthStencil(depthStencil);
if (changedScale)
{
float oldScale = _channel.TextureManager.RenderTargetScale;
_channel.TextureManager.UpdateRenderTargetScale(singleUse);
if (oldScale != _channel.TextureManager.RenderTargetScale)
{
_context.Renderer.Pipeline.SetRenderTargetScale(_channel.TextureManager.RenderTargetScale);
UpdateViewportTransform();
UpdateScissorState();
}
}
_channel.TextureManager.SetClipRegion(clipRegionWidth, clipRegionHeight);
if (useControl && _prevRtNoAlphaMask != rtNoAlphaMask)
{
_prevRtNoAlphaMask = rtNoAlphaMask;
UpdateBlendState();
}
}
/// <summary>
/// Checks if a render target color buffer is used.
/// </summary>
/// <param name="colorState">Color buffer information</param>
/// <returns>True if the specified buffer is enabled/used, false otherwise</returns>
private static bool IsRtEnabled(RtColorState colorState)
{
// Colors are disabled by writing 0 to the format.
return colorState.Format != 0 && colorState.WidthOrStride != 0;
}
/// <summary>
/// Updates host scissor test state based on current GPU state.
/// </summary>
public void UpdateScissorState()
{
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
const int MinX = 0;
const int MinY = 0;
const int MaxW = 0xffff;
const int MaxH = 0xffff;
Span<Rectangle<int>> regions = stackalloc Rectangle<int>[Constants.TotalViewports];
for (int index = 0; index < Constants.TotalViewports; index++)
{
ScissorState scissor = _state.State.ScissorState[index];
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
bool enable = scissor.Enable && (scissor.X1 != MinX ||
scissor.Y1 != MinY ||
scissor.X2 != MaxW ||
scissor.Y2 != MaxH);
if (enable)
{
int x = scissor.X1;
int y = scissor.Y1;
int width = scissor.X2 - x;
int height = scissor.Y2 - y;
if (_state.State.YControl.HasFlag(YControl.NegateY))
{
ref var screenScissor = ref _state.State.ScreenScissorState;
y = screenScissor.Height - height - y;
if (y < 0)
{
height += y;
y = 0;
}
}
float scale = _channel.TextureManager.RenderTargetScale;
if (scale != 1f)
{
x = (int)(x * scale);
y = (int)(y * scale);
2022-01-16 15:23:00 -08:00
width = (int)MathF.Ceiling(width * scale);
height = (int)MathF.Ceiling(height * scale);
}
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
regions[index] = new Rectangle<int>(x, y, width, height);
}
else
{
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
regions[index] = new Rectangle<int>(MinX, MinY, MaxW, MaxH);
}
}
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_context.Renderer.Pipeline.SetScissors(regions);
}
/// <summary>
/// Updates host depth clamp state based on current GPU state.
/// </summary>
/// <param name="state">Current GPU state</param>
private void UpdateDepthClampState()
{
ViewVolumeClipControl clip = _state.State.ViewVolumeClipControl;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
bool clamp = (clip & ViewVolumeClipControl.DepthClampDisabled) == 0;
_pipeline.DepthClampEnable = clamp;
_context.Renderer.Pipeline.SetDepthClamp(clamp);
}
/// <summary>
/// Updates host alpha test state based on current GPU state.
/// </summary>
private void UpdateAlphaTestState()
{
_context.Renderer.Pipeline.SetAlphaTest(
_state.State.AlphaTestEnable,
_state.State.AlphaTestRef,
_state.State.AlphaTestFunc);
_currentSpecState.SetAlphaTest(
_state.State.AlphaTestEnable,
_state.State.AlphaTestRef,
_state.State.AlphaTestFunc);
}
/// <summary>
/// Updates host depth test state based on current GPU state.
/// </summary>
private void UpdateDepthTestState()
{
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
DepthTestDescriptor descriptor = new DepthTestDescriptor(
_state.State.DepthTestEnable,
_state.State.DepthWriteEnable,
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_state.State.DepthTestFunc);
_pipeline.DepthTest = descriptor;
_context.Renderer.Pipeline.SetDepthTest(descriptor);
}
/// <summary>
/// Updates host viewport transform and clipping state based on current GPU state.
/// </summary>
private void UpdateViewportTransform()
{
var yControl = _state.State.YControl;
var face = _state.State.FaceState;
bool disableTransform = _state.State.ViewportTransformEnable == 0;
UpdateFrontFace(yControl, face.FrontFace);
UpdateDepthMode();
bool flipY = yControl.HasFlag(YControl.NegateY);
Span<Viewport> viewports = stackalloc Viewport[Constants.TotalViewports];
for (int index = 0; index < Constants.TotalViewports; index++)
{
if (disableTransform)
{
ref var scissor = ref _state.State.ScreenScissorState;
float rScale = _channel.TextureManager.RenderTargetScale;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
var scissorRect = new Rectangle<float>(0, 0, (scissor.X + scissor.Width) * rScale, (scissor.Y + scissor.Height) * rScale);
viewports[index] = new Viewport(scissorRect, ViewportSwizzle.PositiveX, ViewportSwizzle.PositiveY, ViewportSwizzle.PositiveZ, ViewportSwizzle.PositiveW, 0, 1);
continue;
}
ref var transform = ref _state.State.ViewportTransform[index];
ref var extents = ref _state.State.ViewportExtents[index];
float scaleX = MathF.Abs(transform.ScaleX);
float scaleY = transform.ScaleY;
if (flipY)
{
scaleY = -scaleY;
}
if (!_context.Capabilities.SupportsViewportSwizzle && transform.UnpackSwizzleY() == ViewportSwizzle.NegativeY)
{
scaleY = -scaleY;
}
float x = transform.TranslateX - scaleX;
float y = transform.TranslateY - scaleY;
float width = scaleX * 2;
float height = scaleY * 2;
float scale = _channel.TextureManager.RenderTargetScale;
if (scale != 1f)
{
x *= scale;
y *= scale;
width *= scale;
height *= scale;
}
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
Rectangle<float> region = new Rectangle<float>(x, y, width, height);
ViewportSwizzle swizzleX = transform.UnpackSwizzleX();
ViewportSwizzle swizzleY = transform.UnpackSwizzleY();
ViewportSwizzle swizzleZ = transform.UnpackSwizzleZ();
ViewportSwizzle swizzleW = transform.UnpackSwizzleW();
float depthNear = extents.DepthNear;
float depthFar = extents.DepthFar;
if (transform.ScaleZ < 0)
{
float temp = depthNear;
depthNear = depthFar;
depthFar = temp;
}
viewports[index] = new Viewport(region, swizzleX, swizzleY, swizzleZ, swizzleW, depthNear, depthFar);
}
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_context.Renderer.Pipeline.SetDepthMode(GetDepthMode());
_context.Renderer.Pipeline.SetViewports(viewports, disableTransform);
_currentSpecState.SetViewportTransformDisable(_state.State.ViewportTransformEnable == 0);
_currentSpecState.SetDepthMode(GetDepthMode() == DepthMode.MinusOneToOne);
}
/// <summary>
/// Updates the depth mode (0 to 1 or -1 to 1) based on the current viewport and depth mode register state.
/// </summary>
private void UpdateDepthMode()
{
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_context.Renderer.Pipeline.SetDepthMode(GetDepthMode());
}
/// <summary>
/// Updates polygon mode state based on current GPU state.
/// </summary>
private void UpdatePolygonMode()
{
_context.Renderer.Pipeline.SetPolygonMode(_state.State.PolygonModeFront, _state.State.PolygonModeBack);
}
/// <summary>
/// Updates host depth bias (also called polygon offset) state based on current GPU state.
/// </summary>
private void UpdateDepthBiasState()
{
var depthBias = _state.State.DepthBiasState;
float factor = _state.State.DepthBiasFactor;
float units = _state.State.DepthBiasUnits;
float clamp = _state.State.DepthBiasClamp;
PolygonModeMask enables;
enables = (depthBias.PointEnable ? PolygonModeMask.Point : 0);
enables |= (depthBias.LineEnable ? PolygonModeMask.Line : 0);
enables |= (depthBias.FillEnable ? PolygonModeMask.Fill : 0);
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_pipeline.BiasEnable = enables;
_context.Renderer.Pipeline.SetDepthBias(enables, factor, units / 2f, clamp);
}
/// <summary>
/// Updates host stencil test state based on current GPU state.
/// </summary>
private void UpdateStencilTestState()
{
var backMasks = _state.State.StencilBackMasks;
var test = _state.State.StencilTestState;
var backTest = _state.State.StencilBackTestState;
CompareOp backFunc;
StencilOp backSFail;
StencilOp backDpPass;
StencilOp backDpFail;
int backFuncRef;
int backFuncMask;
int backMask;
if (backTest.TwoSided)
{
backFunc = backTest.BackFunc;
backSFail = backTest.BackSFail;
backDpPass = backTest.BackDpPass;
backDpFail = backTest.BackDpFail;
backFuncRef = backMasks.FuncRef;
backFuncMask = backMasks.FuncMask;
backMask = backMasks.Mask;
}
else
{
backFunc = test.FrontFunc;
backSFail = test.FrontSFail;
backDpPass = test.FrontDpPass;
backDpFail = test.FrontDpFail;
backFuncRef = test.FrontFuncRef;
backFuncMask = test.FrontFuncMask;
backMask = test.FrontMask;
}
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
StencilTestDescriptor descriptor = new StencilTestDescriptor(
test.Enable,
test.FrontFunc,
test.FrontSFail,
test.FrontDpPass,
test.FrontDpFail,
test.FrontFuncRef,
test.FrontFuncMask,
test.FrontMask,
backFunc,
backSFail,
backDpPass,
backDpFail,
backFuncRef,
backFuncMask,
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
backMask);
_pipeline.StencilTest = descriptor;
_context.Renderer.Pipeline.SetStencilTest(descriptor);
}
/// <summary>
/// Updates user-defined clipping based on the guest GPU state.
/// </summary>
private void UpdateUserClipState()
{
uint clipMask = _state.State.ClipDistanceEnable & _vsClipDistancesWritten;
for (int i = 0; i < Constants.TotalClipDistances; ++i)
{
_context.Renderer.Pipeline.SetUserClipDistance(i, (clipMask & (1 << i)) != 0);
}
}
/// <summary>
/// Updates current sampler pool address and size based on guest GPU state.
/// </summary>
private void UpdateSamplerPoolState()
{
var texturePool = _state.State.TexturePoolState;
var samplerPool = _state.State.SamplerPoolState;
var samplerIndex = _state.State.SamplerIndex;
int maximumId = samplerIndex == SamplerIndex.ViaHeaderIndex
? texturePool.MaximumId
: samplerPool.MaximumId;
_channel.TextureManager.SetGraphicsSamplerPool(samplerPool.Address.Pack(), maximumId, samplerIndex);
}
/// <summary>
/// Updates current texture pool address and size based on guest GPU state.
/// </summary>
private void UpdateTexturePoolState()
{
var texturePool = _state.State.TexturePoolState;
_channel.TextureManager.SetGraphicsTexturePool(texturePool.Address.Pack(), texturePool.MaximumId);
_channel.TextureManager.SetGraphicsTextureBufferIndex((int)_state.State.TextureBufferIndex);
_currentSpecState.SetPoolState(GetPoolState());
}
/// <summary>
/// Updates host vertex attributes based on guest GPU state.
/// </summary>
private void UpdateVertexAttribState()
{
uint vbEnableMask = _vbEnableMask;
Span<VertexAttribDescriptor> vertexAttribs = stackalloc VertexAttribDescriptor[Constants.TotalVertexAttribs];
for (int index = 0; index < Constants.TotalVertexAttribs; index++)
{
var vertexAttrib = _state.State.VertexAttribState[index];
int bufferIndex = vertexAttrib.UnpackBufferIndex();
if ((vbEnableMask & (1u << bufferIndex)) == 0)
{
// Using a vertex buffer that doesn't exist is invalid, so let's use a dummy attribute for those cases.
vertexAttribs[index] = new VertexAttribDescriptor(0, 0, true, Format.R32G32B32A32Float);
continue;
}
if (!FormatTable.TryGetAttribFormat(vertexAttrib.UnpackFormat(), out Format format))
{
Logger.Debug?.Print(LogClass.Gpu, $"Invalid attribute format 0x{vertexAttrib.UnpackFormat():X}.");
format = Format.R32G32B32A32Float;
}
vertexAttribs[index] = new VertexAttribDescriptor(
bufferIndex,
vertexAttrib.UnpackOffset(),
vertexAttrib.UnpackIsConstant(),
format);
}
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_pipeline.SetVertexAttribs(vertexAttribs);
_context.Renderer.Pipeline.SetVertexAttribs(vertexAttribs);
_currentSpecState.SetAttributeTypes(ref _state.State.VertexAttribState);
}
/// <summary>
/// Updates host line width based on guest GPU state.
/// </summary>
private void UpdateLineState()
{
float width = _state.State.LineWidthSmooth;
bool smooth = _state.State.LineSmoothEnable;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_pipeline.LineWidth = width;
_context.Renderer.Pipeline.SetLineParameters(width, smooth);
}
/// <summary>
/// Updates host point size based on guest GPU state.
/// </summary>
private void UpdatePointState()
{
float size = _state.State.PointSize;
bool isProgramPointSize = _state.State.VertexProgramPointSize;
bool enablePointSprite = _state.State.PointSpriteEnable;
// TODO: Need to figure out a way to map PointCoordReplace enable bit.
Origin origin = (_state.State.PointCoordReplace & 4) == 0 ? Origin.LowerLeft : Origin.UpperLeft;
_context.Renderer.Pipeline.SetPointParameters(size, isProgramPointSize, enablePointSprite, origin);
_currentSpecState.SetProgramPointSizeEnable(isProgramPointSize);
_currentSpecState.SetPointSize(size);
}
/// <summary>
/// Updates host primitive restart based on guest GPU state.
/// </summary>
private void UpdatePrimitiveRestartState()
{
PrimitiveRestartState primitiveRestart = _state.State.PrimitiveRestartState;
bool enable = primitiveRestart.Enable && (_drawState.DrawIndexed || _state.State.PrimitiveRestartDrawArrays);
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_pipeline.PrimitiveRestartEnable = enable;
_context.Renderer.Pipeline.SetPrimitiveRestart(enable, primitiveRestart.Index);
}
/// <summary>
/// Updates host index buffer binding based on guest GPU state.
/// </summary>
private void UpdateIndexBufferState()
{
var indexBuffer = _state.State.IndexBufferState;
if (_drawState.IndexCount == 0)
{
return;
}
ulong gpuVa = indexBuffer.Address.Pack();
// Do not use the end address to calculate the size, because
// the result may be much larger than the real size of the index buffer.
ulong size = (ulong)(_drawState.FirstIndex + _drawState.IndexCount);
switch (indexBuffer.Type)
{
case IndexType.UShort: size *= 2; break;
case IndexType.UInt: size *= 4; break;
}
_channel.BufferManager.SetIndexBuffer(gpuVa, size, indexBuffer.Type);
}
/// <summary>
/// Updates host vertex buffer bindings based on guest GPU state.
/// </summary>
private void UpdateVertexBufferState()
{
IndexType indexType = _state.State.IndexBufferState.Type;
bool indexTypeSmall = indexType == IndexType.UByte || indexType == IndexType.UShort;
_drawState.IsAnyVbInstanced = false;
bool drawIndexed = _drawState.DrawIndexed;
bool drawIndirect = _drawState.DrawIndirect;
uint vbEnableMask = 0;
for (int index = 0; index < Constants.TotalVertexBuffers; index++)
{
var vertexBuffer = _state.State.VertexBufferState[index];
if (!vertexBuffer.UnpackEnable())
{
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_pipeline.VertexBuffers[index] = new BufferPipelineDescriptor(false, 0, 0);
_channel.BufferManager.SetVertexBuffer(index, 0, 0, 0, 0);
continue;
}
GpuVa endAddress = _state.State.VertexBufferEndAddress[index];
ulong address = vertexBuffer.Address.Pack();
if (_channel.MemoryManager.IsMapped(address))
{
vbEnableMask |= 1u << index;
}
int stride = vertexBuffer.UnpackStride();
bool instanced = _state.State.VertexBufferInstanced[index];
int divisor = instanced ? vertexBuffer.Divisor : 0;
_drawState.IsAnyVbInstanced |= divisor != 0;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
ulong vbSize = endAddress.Pack() - address + 1;
ulong size;
if (_drawState.IbStreamer.HasInlineIndexData || drawIndexed || stride == 0 || instanced)
{
// This size may be (much) larger than the real vertex buffer size.
// Avoid calculating it this way, unless we don't have any other option.
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
size = vbSize;
if (stride > 0 && indexTypeSmall && drawIndexed && !drawIndirect && !instanced)
{
// If the index type is a small integer type, then we might be still able
// to reduce the vertex buffer size based on the maximum possible index value.
ulong maxVertexBufferSize = indexType == IndexType.UByte ? 0x100UL : 0x10000UL;
maxVertexBufferSize += _state.State.FirstVertex;
maxVertexBufferSize *= (uint)stride;
size = Math.Min(size, maxVertexBufferSize);
}
}
else
{
// For non-indexed draws, we can guess the size from the vertex count
// and stride.
int firstInstance = (int)_state.State.FirstInstance;
var drawState = _state.State.VertexBufferDrawState;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
size = Math.Min(vbSize, (ulong)((firstInstance + drawState.First + drawState.Count) * stride));
}
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_pipeline.VertexBuffers[index] = new BufferPipelineDescriptor(_channel.MemoryManager.IsMapped(address), stride, divisor);
_channel.BufferManager.SetVertexBuffer(index, address, size, stride, divisor);
}
if (_vbEnableMask != vbEnableMask)
{
_vbEnableMask = vbEnableMask;
UpdateVertexAttribState();
}
}
/// <summary>
/// Updates host face culling and orientation based on guest GPU state.
/// </summary>
private void UpdateFaceState()
{
var yControl = _state.State.YControl;
var face = _state.State.FaceState;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_pipeline.CullEnable = face.CullEnable;
_pipeline.CullMode = face.CullFace;
_context.Renderer.Pipeline.SetFaceCulling(face.CullEnable, face.CullFace);
UpdateFrontFace(yControl, face.FrontFace);
}
/// <summary>
/// Updates the front face based on the current front face and the origin.
/// </summary>
/// <param name="yControl">Y control register value, where the origin is located</param>
/// <param name="frontFace">Front face</param>
private void UpdateFrontFace(YControl yControl, FrontFace frontFace)
{
bool isUpperLeftOrigin = !yControl.HasFlag(YControl.TriangleRastFlip);
if (isUpperLeftOrigin)
{
frontFace = frontFace == FrontFace.CounterClockwise ? FrontFace.Clockwise : FrontFace.CounterClockwise;
}
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_pipeline.FrontFace = frontFace;
_context.Renderer.Pipeline.SetFrontFace(frontFace);
}
/// <summary>
/// Updates host render target color masks, based on guest GPU state.
/// This defines which color channels are written to each color buffer.
/// </summary>
private void UpdateRtColorMask()
{
bool rtColorMaskShared = _state.State.RtColorMaskShared;
Span<uint> componentMasks = stackalloc uint[Constants.TotalRenderTargets];
for (int index = 0; index < Constants.TotalRenderTargets; index++)
{
var colorMask = _state.State.RtColorMask[rtColorMaskShared ? 0 : index];
uint componentMask;
componentMask = (colorMask.UnpackRed() ? 1u : 0u);
componentMask |= (colorMask.UnpackGreen() ? 2u : 0u);
componentMask |= (colorMask.UnpackBlue() ? 4u : 0u);
componentMask |= (colorMask.UnpackAlpha() ? 8u : 0u);
componentMasks[index] = componentMask;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_pipeline.ColorWriteMask[index] = componentMask;
}
_context.Renderer.Pipeline.SetRenderTargetColorMasks(componentMasks);
}
/// <summary>
/// Updates host render target color buffer blending state, based on guest state.
/// </summary>
private void UpdateBlendState()
{
bool blendIndependent = _state.State.BlendIndependent;
ColorF blendConstant = _state.State.BlendConstant;
if (blendIndependent)
{
for (int index = 0; index < Constants.TotalRenderTargets; index++)
{
bool enable = _state.State.BlendEnable[index];
var blend = _state.State.BlendState[index];
var descriptor = new BlendDescriptor(
enable,
blendConstant,
blend.ColorOp,
FilterBlendFactor(blend.ColorSrcFactor, index),
FilterBlendFactor(blend.ColorDstFactor, index),
blend.AlphaOp,
FilterBlendFactor(blend.AlphaSrcFactor, index),
FilterBlendFactor(blend.AlphaDstFactor, index));
_pipeline.BlendDescriptors[index] = descriptor;
_context.Renderer.Pipeline.SetBlendState(index, descriptor);
}
}
else
{
bool enable = _state.State.BlendEnable[0];
var blend = _state.State.BlendStateCommon;
var descriptor = new BlendDescriptor(
enable,
blendConstant,
blend.ColorOp,
FilterBlendFactor(blend.ColorSrcFactor, 0),
FilterBlendFactor(blend.ColorDstFactor, 0),
blend.AlphaOp,
FilterBlendFactor(blend.AlphaSrcFactor, 0),
FilterBlendFactor(blend.AlphaDstFactor, 0));
for (int index = 0; index < Constants.TotalRenderTargets; index++)
{
_pipeline.BlendDescriptors[index] = descriptor;
_context.Renderer.Pipeline.SetBlendState(index, descriptor);
}
}
}
/// <summary>
/// Gets a blend factor for the color target currently.
/// This will return <paramref name="factor"/> unless the target format has no alpha component,
/// in which case it will replace destination alpha factor with a constant factor of one or zero.
/// </summary>
/// <param name="factor">Input factor</param>
/// <param name="index">Color target index</param>
/// <returns>New blend factor</returns>
private BlendFactor FilterBlendFactor(BlendFactor factor, int index)
{
// If any color target format without alpha is being used, we need to make sure that
// if blend is active, it will not use destination alpha as a factor.
// That is required because RGBX formats are emulated using host RGBA formats.
if (_state.State.RtColorState[index].Format.NoAlpha())
{
switch (factor)
{
case BlendFactor.DstAlpha:
case BlendFactor.DstAlphaGl:
factor = BlendFactor.One;
break;
case BlendFactor.OneMinusDstAlpha:
case BlendFactor.OneMinusDstAlphaGl:
factor = BlendFactor.Zero;
break;
}
}
return factor;
}
/// <summary>
/// Updates host logical operation state, based on guest state.
/// </summary>
private void UpdateLogicOpState()
{
LogicalOpState logicOpState = _state.State.LogicOpState;
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
_pipeline.SetLogicOpState(logicOpState.Enable, logicOpState.LogicalOp);
_context.Renderer.Pipeline.SetLogicOpState(logicOpState.Enable, logicOpState.LogicalOp);
}
/// <summary>
/// Updates multisample state, based on guest state.
/// </summary>
private void UpdateMultisampleState()
{
bool alphaToCoverageEnable = (_state.State.MultisampleControl & 1) != 0;
bool alphaToOneEnable = (_state.State.MultisampleControl & 0x10) != 0;
_context.Renderer.Pipeline.SetMultisampleState(new MultisampleDescriptor(
alphaToCoverageEnable,
_state.State.AlphaToCoverageDitherEnable,
alphaToOneEnable));
_currentSpecState.SetAlphaToCoverageEnable(alphaToCoverageEnable, _state.State.AlphaToCoverageDitherEnable);
}
/// <summary>
/// Updates the early z flag, based on guest state.
/// </summary>
private void UpdateEarlyZState()
{
_currentSpecState.SetEarlyZForce(_state.State.EarlyZForce);
}
/// <summary>
/// Updates host shaders based on the guest GPU state.
/// </summary>
private void UpdateShaderState()
{
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
var shaderCache = _channel.MemoryManager.Physical.ShaderCache;
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
_vtgWritesRtLayer = false;
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
ShaderAddresses addresses = new ShaderAddresses();
Span<ulong> addressesSpan = addresses.AsSpan();
ulong baseAddress = _state.State.ShaderBaseAddress.Pack();
for (int index = 0; index < 6; index++)
{
var shader = _state.State.ShaderState[index];
if (!shader.UnpackEnable() && index != 1)
{
continue;
}
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
addressesSpan[index] = baseAddress + shader.Offset;
}
CachedShaderProgram gs = shaderCache.GetGraphicsShader(ref _state.State, ref _pipeline, _channel, ref _currentSpecState.GetPoolState(), ref _currentSpecState.GetGraphicsState(), addresses);
// Consume the modified flag for spec state so that it isn't checked again.
_currentSpecState.SetShader(gs);
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
_shaderSpecState = gs.SpecializationState;
byte oldVsClipDistancesWritten = _vsClipDistancesWritten;
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
_drawState.VsUsesInstanceId = gs.Shaders[1]?.Info.UsesInstanceId ?? false;
_vsUsesDrawParameters = gs.Shaders[1]?.Info.UsesDrawParameters ?? false;
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
_vsClipDistancesWritten = gs.Shaders[1]?.Info.ClipDistancesWritten ?? 0;
if (oldVsClipDistancesWritten != _vsClipDistancesWritten)
{
UpdateUserClipState();
}
UpdateShaderBindings(gs.Bindings);
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
for (int stageIndex = 0; stageIndex < Constants.ShaderStages; stageIndex++)
{
ShaderProgramInfo info = gs.Shaders[stageIndex + 1]?.Info;
if (info?.UsesRtLayer == true)
{
_vtgWritesRtLayer = true;
}
_currentProgramInfo[stageIndex] = info;
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
}
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
_context.Renderer.Pipeline.SetProgram(gs.HostProgram);
}
/// <summary>
/// Updates bindings consumed by the shader on the texture and buffer managers.
/// </summary>
/// <param name="bindings">Bindings for the active shader</param>
private void UpdateShaderBindings(CachedShaderBindings bindings)
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
{
_channel.TextureManager.SetGraphicsBindings(bindings);
_channel.BufferManager.SetGraphicsBufferBindings(bindings);
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
}
/// <summary>
/// Gets the current texture pool state.
/// </summary>
/// <returns>Texture pool state</returns>
New shader cache implementation (#3194) * New shader cache implementation * Remove some debug code * Take transform feedback varying count into account * Create shader cache directory if it does not exist + fragment output map related fixes * Remove debug code * Only check texture descriptors if the constant buffer is bound * Also check CPU VA on GetSpanMapped * Remove more unused code and move cache related code * XML docs + remove more unused methods * Better codegen for TransformFeedbackDescriptor.AsSpan * Support migration from old cache format, remove more unused code Shader cache rebuild now also rewrites the shared toc and data files * Fix migration error with BRX shaders * Add a limit to the async translation queue Avoid async translation threads not being able to keep up and the queue growing very large * Re-create specialization state on recompile This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access * Make shader cache more error resilient * Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc * Address early PR feedback * Fix rebase * Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly * Handle some missing exceptions * Make shader cache purge delete both old and new shader caches * Register textures on new specialization state * Translate and compile shaders in forward order (eliminates diffs due to different binding numbers) * Limit in-flight shader compilation to the maximum number of compilation threads * Replace ParallelDiskCacheLoader state changed event with a callback function * Better handling for invalid constant buffer 1 data length * Do not create the old cache directory structure if the old cache does not exist * Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented) * Replace rectangle texture with just coordinate normalization * Skip incompatible shaders that are missing texture information, instead of crashing This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable * Fix coordinates normalization on cubemap textures * Check if title ID is null before combining shader cache path * More robust constant buffer address validation on spec state * More robust constant buffer address validation on spec state (2) * Regenerate shader cache with one stream, rather than one per shader. * Only create shader cache directory during initialization * Logging improvements * Proper shader program disposal * PR feedback, and add a comment on serialized structs * XML docs for RegisterTexture Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2022-04-10 06:49:44 -07:00
private GpuChannelPoolState GetPoolState()
{
return new GpuChannelPoolState(
_state.State.TexturePoolState.Address.Pack(),
_state.State.TexturePoolState.MaximumId,
(int)_state.State.TextureBufferIndex);
}
/// <summary>
/// Gets the depth mode that is currently being used (zero to one or minus one to one).
/// </summary>
/// <returns>Current depth mode</returns>
Vulkan backend (#2518) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. :pensive: * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 14:26:06 -07:00
private DepthMode GetDepthMode()
{
ref var transform = ref _state.State.ViewportTransform[0];
ref var extents = ref _state.State.ViewportExtents[0];
DepthMode depthMode;
if (!float.IsInfinity(extents.DepthNear) &&
!float.IsInfinity(extents.DepthFar) &&
(extents.DepthFar - extents.DepthNear) != 0)
{
// Try to guess the depth mode being used on the high level API
// based on current transform.
// It is setup like so by said APIs:
// If depth mode is ZeroToOne:
// TranslateZ = Near
// ScaleZ = Far - Near
// If depth mode is MinusOneToOne:
// TranslateZ = (Near + Far) / 2
// ScaleZ = (Far - Near) / 2
// DepthNear/Far are sorted such as that Near is always less than Far.
depthMode = extents.DepthNear != transform.TranslateZ &&
extents.DepthFar != transform.TranslateZ
? DepthMode.MinusOneToOne
: DepthMode.ZeroToOne;
}
else
{
// If we can't guess from the viewport transform, then just use the depth mode register.
depthMode = (DepthMode)(_state.State.DepthMode & 1);
}
return depthMode;
}
/// <summary>
/// Forces the shaders to be rebound on the next draw.
/// </summary>
public void ForceShaderUpdate()
{
_updateTracker.ForceDirty(ShaderStateIndex);
}
}
}