POWER - Performance Optimizations With Extensive Ramifications (#2286)
* Refactoring of KMemoryManager class
* Replace some trivial uses of DRAM address with VA
* Get rid of GetDramAddressFromVa
* Abstracting more operations on derived page table class
* Run auto-format on KPageTableBase
* Managed to make TryConvertVaToPa private, few uses remains now
* Implement guest physical pages ref counting, remove manual freeing
* Make DoMmuOperation private and call new abstract methods only from the base class
* Pass pages count rather than size on Map/UnmapMemory
* Change memory managers to take host pointers
* Fix a guest memory leak and simplify KPageTable
* Expose new methods for host range query and mapping
* Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists
* Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking)
* Add a SharedMemoryStorage class, will be useful for host mapping
* Sayonara AddVaRangeToPageList, you served us well
* Start to implement host memory mapping (WIP)
* Support memory tracking through host exception handling
* Fix some access violations from HLE service guest memory access and CPU
* Fix memory tracking
* Fix mapping list bugs, including a race and a error adding mapping ranges
* Simple page table for memory tracking
* Simple "volatile" region handle mode
* Update UBOs directly (experimental, rough)
* Fix the overlap check
* Only set non-modified buffers as volatile
* Fix some memory tracking issues
* Fix possible race in MapBufferFromClientProcess (block list updates were not locked)
* Write uniform update to memory immediately, only defer the buffer set.
* Fix some memory tracking issues
* Pass correct pages count on shared memory unmap
* Armeilleure Signal Handler v1 + Unix changes
Unix currently behaves like windows, rather than remapping physical
* Actually check if the host platform is unix
* Fix decommit on linux.
* Implement windows 10 placeholder shared memory, fix a buffer issue.
* Make PTC version something that will never match with master
* Remove testing variable for block count
* Add reference count for memory manager, fix dispose
Can still deadlock with OpenAL
* Add address validation, use page table for mapped check, add docs
Might clean up the page table traversing routines.
* Implement batched mapping/tracking.
* Move documentation, fix tests.
* Cleanup uniform buffer update stuff.
* Remove unnecessary assignment.
* Add unsafe host mapped memory switch
On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work.
* Remove C# exception handlers
They have issues due to current .NET limitations, so the meilleure one fully replaces them for now.
* Fix MapPhysicalMemory on the software MemoryManager.
* Null check for GetHostAddress, docs
* Add configuration for setting memory manager mode (not in UI yet)
* Add config to UI
* Fix type mismatch on Unix signal handler code emit
* Fix 6GB DRAM mode.
The size can be greater than `uint.MaxValue` when the DRAM is >4GB.
* Address some feedback.
* More detailed error if backing memory cannot be mapped.
* SetLastError on all OS functions for consistency
* Force pages dirty with UBO update instead of setting them directly.
Seems to be much faster across a few games. Need retesting.
* Rebase, configuration rework, fix mem tracking regression
* Fix race in FreePages
* Set memory managers null after decrementing ref count
* Remove readonly keyword, as this is now modified.
* Use a local variable for the signal handler rather than a register.
* Fix bug with buffer resize, and index/uniform buffer binding.
Should fix flickering in games.
* Add InvalidAccessHandler to MemoryTracking
Doesn't do anything yet
* Call invalid access handler on unmapped read/write.
Same rules as the regular memory manager.
* Make unsafe mapped memory its own MemoryManagerType
* Move FlushUboDirty into UpdateState.
* Buffer dirty cache, rather than ubo cache
Much cleaner, may be reusable for Inline2Memory updates.
* This doesn't return anything anymore.
* Add sigaction remove methods, correct a few function signatures.
* Return empty list of physical regions for size 0.
* Also on AddressSpaceManager
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-05-24 13:52:44 -07:00
|
|
|
|
using ARMeilleure.IntermediateRepresentation;
|
|
|
|
|
using ARMeilleure.Translation;
|
|
|
|
|
using System;
|
|
|
|
|
using System.Runtime.CompilerServices;
|
|
|
|
|
using System.Runtime.InteropServices;
|
|
|
|
|
|
Reduce JIT GC allocations (#2515)
* Turn `MemoryOperand` into a struct
* Remove `IntrinsicOperation`
* Remove `PhiNode`
* Remove `Node`
* Turn `Operand` into a struct
* Turn `Operation` into a struct
* Clean up pool management methods
* Add `Arena` allocator
* Move `OperationHelper` to `Operation.Factory`
* Move `OperandHelper` to `Operand.Factory`
* Optimize `Operation` a bit
* Fix `Arena` initialization
* Rename `NativeList<T>` to `ArenaList<T>`
* Reduce `Operand` size from 88 to 56 bytes
* Reduce `Operation` size from 56 to 40 bytes
* Add optimistic interning of Register & Constant operands
* Optimize `RegisterUsage` pass a bit
* Optimize `RemoveUnusedNodes` pass a bit
Iterating in reverse-order allows killing dependency chains in a single
pass.
* Fix PPTC symbols
* Optimize `BasicBlock` a bit
Reduce allocations from `_successor` & `DominanceFrontiers`
* Fix `Operation` resize
* Make `Arena` expandable
Change the arena allocator to be expandable by allocating in pages, with
some of them being pooled. Currently 32 pages are pooled. An LRU removal
mechanism should probably be added to it.
Apparently MHR can allocate bitmaps large enough to exceed the 16MB
limit for the type.
* Move `Arena` & `ArenaList` to `Common`
* Remove `ThreadStaticPool` & co
* Add `PhiOperation`
* Reduce `Operand` size from 56 from 48 bytes
* Add linear-probing to `Operand` intern table
* Optimize `HybridAllocator` a bit
* Add `Allocators` class
* Tune `ArenaAllocator` sizes
* Add page removal mechanism to `ArenaAllocator`
Remove pages which have not been used for more than 5s after each reset.
I am on fence if this would be better using a Gen2 callback object like
the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right
now if a large translation happens, the pages will be freed only after a
reset. This reset may not happen for a while because no new translation
is hit, but the arena base sizes are rather small.
* Fix `OOM` when allocating larger than page size in `ArenaAllocator`
Tweak resizing mechanism for Operand.Uses and Assignemnts.
* Optimize `Optimizer` a bit
* Optimize `Operand.Add<T>/Remove<T>` a bit
* Clean up `PreAllocator`
* Fix phi insertion order
Reduce codegen diffs.
* Fix code alignment
* Use new heuristics for degree of parallelism
* Suppress warnings
* Address gdkchan's feedback
Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that
`Operand.Value` should usually not be modified directly.
* Add fast path to `ArenaAllocator`
* Assembly for `ArenaAllocator.Allocate(ulong)`:
.L0:
mov rax, [rcx+0x18]
lea r8, [rax+rdx]
cmp r8, [rcx+0x10]
ja short .L2
.L1:
mov rdx, [rcx+8]
add rax, [rdx+8]
mov [rcx+0x18], r8
ret
.L2:
jmp ArenaAllocator.AllocateSlow(UInt64)
A few variable/field had to be changed to ulong so that RyuJIT avoids
emitting zero-extends.
* Implement a new heuristic to free pooled pages.
If an arena is used often, it is more likely that its pages will be
needed, so the pages are kept for longer (e.g: during PPTC rebuild or
burst sof compilations). If is not used often, then it is more likely
that its pages will not be needed (e.g: after PPTC rebuild or bursts
of compilations).
* Address riperiperi's feedback
* Use `EqualityComparer<T>` in `IntrusiveList<T>`
Avoids a potential GC hole in `Equals(T, T)`.
2021-08-17 11:08:34 -07:00
|
|
|
|
using static ARMeilleure.IntermediateRepresentation.Operand.Factory;
|
POWER - Performance Optimizations With Extensive Ramifications (#2286)
* Refactoring of KMemoryManager class
* Replace some trivial uses of DRAM address with VA
* Get rid of GetDramAddressFromVa
* Abstracting more operations on derived page table class
* Run auto-format on KPageTableBase
* Managed to make TryConvertVaToPa private, few uses remains now
* Implement guest physical pages ref counting, remove manual freeing
* Make DoMmuOperation private and call new abstract methods only from the base class
* Pass pages count rather than size on Map/UnmapMemory
* Change memory managers to take host pointers
* Fix a guest memory leak and simplify KPageTable
* Expose new methods for host range query and mapping
* Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists
* Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking)
* Add a SharedMemoryStorage class, will be useful for host mapping
* Sayonara AddVaRangeToPageList, you served us well
* Start to implement host memory mapping (WIP)
* Support memory tracking through host exception handling
* Fix some access violations from HLE service guest memory access and CPU
* Fix memory tracking
* Fix mapping list bugs, including a race and a error adding mapping ranges
* Simple page table for memory tracking
* Simple "volatile" region handle mode
* Update UBOs directly (experimental, rough)
* Fix the overlap check
* Only set non-modified buffers as volatile
* Fix some memory tracking issues
* Fix possible race in MapBufferFromClientProcess (block list updates were not locked)
* Write uniform update to memory immediately, only defer the buffer set.
* Fix some memory tracking issues
* Pass correct pages count on shared memory unmap
* Armeilleure Signal Handler v1 + Unix changes
Unix currently behaves like windows, rather than remapping physical
* Actually check if the host platform is unix
* Fix decommit on linux.
* Implement windows 10 placeholder shared memory, fix a buffer issue.
* Make PTC version something that will never match with master
* Remove testing variable for block count
* Add reference count for memory manager, fix dispose
Can still deadlock with OpenAL
* Add address validation, use page table for mapped check, add docs
Might clean up the page table traversing routines.
* Implement batched mapping/tracking.
* Move documentation, fix tests.
* Cleanup uniform buffer update stuff.
* Remove unnecessary assignment.
* Add unsafe host mapped memory switch
On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work.
* Remove C# exception handlers
They have issues due to current .NET limitations, so the meilleure one fully replaces them for now.
* Fix MapPhysicalMemory on the software MemoryManager.
* Null check for GetHostAddress, docs
* Add configuration for setting memory manager mode (not in UI yet)
* Add config to UI
* Fix type mismatch on Unix signal handler code emit
* Fix 6GB DRAM mode.
The size can be greater than `uint.MaxValue` when the DRAM is >4GB.
* Address some feedback.
* More detailed error if backing memory cannot be mapped.
* SetLastError on all OS functions for consistency
* Force pages dirty with UBO update instead of setting them directly.
Seems to be much faster across a few games. Need retesting.
* Rebase, configuration rework, fix mem tracking regression
* Fix race in FreePages
* Set memory managers null after decrementing ref count
* Remove readonly keyword, as this is now modified.
* Use a local variable for the signal handler rather than a register.
* Fix bug with buffer resize, and index/uniform buffer binding.
Should fix flickering in games.
* Add InvalidAccessHandler to MemoryTracking
Doesn't do anything yet
* Call invalid access handler on unmapped read/write.
Same rules as the regular memory manager.
* Make unsafe mapped memory its own MemoryManagerType
* Move FlushUboDirty into UpdateState.
* Buffer dirty cache, rather than ubo cache
Much cleaner, may be reusable for Inline2Memory updates.
* This doesn't return anything anymore.
* Add sigaction remove methods, correct a few function signatures.
* Return empty list of physical regions for size 0.
* Also on AddressSpaceManager
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-05-24 13:52:44 -07:00
|
|
|
|
|
|
|
|
|
namespace ARMeilleure.Signal
|
|
|
|
|
{
|
|
|
|
|
[StructLayout(LayoutKind.Sequential, Pack = 1)]
|
|
|
|
|
struct SignalHandlerRange
|
|
|
|
|
{
|
|
|
|
|
public int IsActive;
|
|
|
|
|
public nuint RangeAddress;
|
|
|
|
|
public nuint RangeEndAddress;
|
|
|
|
|
public IntPtr ActionPointer;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
[StructLayout(LayoutKind.Sequential, Pack = 1)]
|
|
|
|
|
struct SignalHandlerConfig
|
|
|
|
|
{
|
|
|
|
|
/// <summary>
|
|
|
|
|
/// The byte offset of the faulting address in the SigInfo or ExceptionRecord struct.
|
|
|
|
|
/// </summary>
|
|
|
|
|
public int StructAddressOffset;
|
|
|
|
|
|
|
|
|
|
/// <summary>
|
|
|
|
|
/// The byte offset of the write flag in the SigInfo or ExceptionRecord struct.
|
|
|
|
|
/// </summary>
|
|
|
|
|
public int StructWriteOffset;
|
|
|
|
|
|
|
|
|
|
/// <summary>
|
|
|
|
|
/// The sigaction handler that was registered before this one. (unix only)
|
|
|
|
|
/// </summary>
|
|
|
|
|
public nuint UnixOldSigaction;
|
|
|
|
|
|
|
|
|
|
/// <summary>
|
|
|
|
|
/// The type of the previous sigaction. True for the 3 argument variant. (unix only)
|
|
|
|
|
/// </summary>
|
|
|
|
|
public int UnixOldSigaction3Arg;
|
|
|
|
|
|
|
|
|
|
public SignalHandlerRange Range0;
|
|
|
|
|
public SignalHandlerRange Range1;
|
|
|
|
|
public SignalHandlerRange Range2;
|
|
|
|
|
public SignalHandlerRange Range3;
|
|
|
|
|
public SignalHandlerRange Range4;
|
|
|
|
|
public SignalHandlerRange Range5;
|
|
|
|
|
public SignalHandlerRange Range6;
|
|
|
|
|
public SignalHandlerRange Range7;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
public static class NativeSignalHandler
|
|
|
|
|
{
|
|
|
|
|
private delegate void UnixExceptionHandler(int sig, IntPtr info, IntPtr ucontext);
|
|
|
|
|
[UnmanagedFunctionPointer(CallingConvention.Winapi)]
|
|
|
|
|
private delegate int VectoredExceptionHandler(IntPtr exceptionInfo);
|
|
|
|
|
|
|
|
|
|
private const int MaxTrackedRanges = 8;
|
|
|
|
|
|
|
|
|
|
private const int StructAddressOffset = 0;
|
|
|
|
|
private const int StructWriteOffset = 4;
|
|
|
|
|
private const int UnixOldSigaction = 8;
|
|
|
|
|
private const int UnixOldSigaction3Arg = 16;
|
|
|
|
|
private const int RangeOffset = 20;
|
|
|
|
|
|
|
|
|
|
private const int EXCEPTION_CONTINUE_SEARCH = 0;
|
|
|
|
|
private const int EXCEPTION_CONTINUE_EXECUTION = -1;
|
|
|
|
|
|
|
|
|
|
private const uint EXCEPTION_ACCESS_VIOLATION = 0xc0000005;
|
|
|
|
|
|
|
|
|
|
private const ulong PageSize = 0x1000;
|
|
|
|
|
private const ulong PageMask = PageSize - 1;
|
|
|
|
|
|
|
|
|
|
private static IntPtr _handlerConfig;
|
|
|
|
|
private static IntPtr _signalHandlerPtr;
|
|
|
|
|
private static IntPtr _signalHandlerHandle;
|
|
|
|
|
|
|
|
|
|
private static readonly object _lock = new object();
|
|
|
|
|
private static bool _initialized;
|
|
|
|
|
|
|
|
|
|
static NativeSignalHandler()
|
|
|
|
|
{
|
|
|
|
|
_handlerConfig = Marshal.AllocHGlobal(Unsafe.SizeOf<SignalHandlerConfig>());
|
|
|
|
|
ref SignalHandlerConfig config = ref GetConfigRef();
|
|
|
|
|
|
|
|
|
|
config = new SignalHandlerConfig();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
public static void InitializeSignalHandler()
|
|
|
|
|
{
|
|
|
|
|
if (_initialized) return;
|
|
|
|
|
|
|
|
|
|
lock (_lock)
|
|
|
|
|
{
|
|
|
|
|
if (_initialized) return;
|
|
|
|
|
|
|
|
|
|
bool unix = RuntimeInformation.IsOSPlatform(OSPlatform.Linux) || RuntimeInformation.IsOSPlatform(OSPlatform.OSX);
|
|
|
|
|
ref SignalHandlerConfig config = ref GetConfigRef();
|
|
|
|
|
|
|
|
|
|
if (unix)
|
|
|
|
|
{
|
|
|
|
|
// Unix siginfo struct locations.
|
|
|
|
|
// NOTE: These are incredibly likely to be different between kernel version and architectures.
|
|
|
|
|
|
|
|
|
|
config.StructAddressOffset = 16; // si_addr
|
|
|
|
|
config.StructWriteOffset = 8; // si_code
|
|
|
|
|
|
|
|
|
|
_signalHandlerPtr = Marshal.GetFunctionPointerForDelegate(GenerateUnixSignalHandler(_handlerConfig));
|
|
|
|
|
|
|
|
|
|
SigAction old = UnixSignalHandlerRegistration.RegisterExceptionHandler(_signalHandlerPtr);
|
|
|
|
|
config.UnixOldSigaction = (nuint)(ulong)old.sa_handler;
|
|
|
|
|
config.UnixOldSigaction3Arg = old.sa_flags & 4;
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
{
|
|
|
|
|
config.StructAddressOffset = 40; // ExceptionInformation1
|
|
|
|
|
config.StructWriteOffset = 32; // ExceptionInformation0
|
|
|
|
|
|
|
|
|
|
_signalHandlerPtr = Marshal.GetFunctionPointerForDelegate(GenerateWindowsSignalHandler(_handlerConfig));
|
|
|
|
|
|
|
|
|
|
_signalHandlerHandle = WindowsSignalHandlerRegistration.RegisterExceptionHandler(_signalHandlerPtr);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
_initialized = true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
private static unsafe ref SignalHandlerConfig GetConfigRef()
|
|
|
|
|
{
|
|
|
|
|
return ref Unsafe.AsRef<SignalHandlerConfig>((void*)_handlerConfig);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
public static unsafe bool AddTrackedRegion(nuint address, nuint endAddress, IntPtr action)
|
|
|
|
|
{
|
|
|
|
|
var ranges = &((SignalHandlerConfig*)_handlerConfig)->Range0;
|
|
|
|
|
|
|
|
|
|
for (int i = 0; i < MaxTrackedRanges; i++)
|
|
|
|
|
{
|
|
|
|
|
if (ranges[i].IsActive == 0)
|
|
|
|
|
{
|
|
|
|
|
ranges[i].RangeAddress = address;
|
|
|
|
|
ranges[i].RangeEndAddress = endAddress;
|
|
|
|
|
ranges[i].ActionPointer = action;
|
|
|
|
|
ranges[i].IsActive = 1;
|
|
|
|
|
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
public static unsafe bool RemoveTrackedRegion(nuint address)
|
|
|
|
|
{
|
|
|
|
|
var ranges = &((SignalHandlerConfig*)_handlerConfig)->Range0;
|
|
|
|
|
|
|
|
|
|
for (int i = 0; i < MaxTrackedRanges; i++)
|
|
|
|
|
{
|
|
|
|
|
if (ranges[i].IsActive == 1 && ranges[i].RangeAddress == address)
|
|
|
|
|
{
|
|
|
|
|
ranges[i].IsActive = 0;
|
|
|
|
|
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
private static Operand EmitGenericRegionCheck(EmitterContext context, IntPtr signalStructPtr, Operand faultAddress, Operand isWrite)
|
|
|
|
|
{
|
|
|
|
|
Operand inRegionLocal = context.AllocateLocal(OperandType.I32);
|
|
|
|
|
context.Copy(inRegionLocal, Const(0));
|
|
|
|
|
|
|
|
|
|
Operand endLabel = Label();
|
|
|
|
|
|
|
|
|
|
for (int i = 0; i < MaxTrackedRanges; i++)
|
|
|
|
|
{
|
|
|
|
|
ulong rangeBaseOffset = (ulong)(RangeOffset + i * Unsafe.SizeOf<SignalHandlerRange>());
|
|
|
|
|
|
|
|
|
|
Operand nextLabel = Label();
|
|
|
|
|
|
|
|
|
|
Operand isActive = context.Load(OperandType.I32, Const((ulong)signalStructPtr + rangeBaseOffset));
|
|
|
|
|
|
|
|
|
|
context.BranchIfFalse(nextLabel, isActive);
|
|
|
|
|
|
|
|
|
|
Operand rangeAddress = context.Load(OperandType.I64, Const((ulong)signalStructPtr + rangeBaseOffset + 4));
|
|
|
|
|
Operand rangeEndAddress = context.Load(OperandType.I64, Const((ulong)signalStructPtr + rangeBaseOffset + 12));
|
|
|
|
|
|
|
|
|
|
// Is the fault address within this tracked region?
|
|
|
|
|
Operand inRange = context.BitwiseAnd(
|
|
|
|
|
context.ICompare(faultAddress, rangeAddress, Comparison.GreaterOrEqualUI),
|
|
|
|
|
context.ICompare(faultAddress, rangeEndAddress, Comparison.Less)
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
// Only call tracking if in range.
|
|
|
|
|
context.BranchIfFalse(nextLabel, inRange, BasicBlockFrequency.Cold);
|
|
|
|
|
|
|
|
|
|
context.Copy(inRegionLocal, Const(1));
|
|
|
|
|
Operand offset = context.BitwiseAnd(context.Subtract(faultAddress, rangeAddress), Const(~PageMask));
|
|
|
|
|
|
|
|
|
|
// Call the tracking action, with the pointer's relative offset to the base address.
|
|
|
|
|
Operand trackingActionPtr = context.Load(OperandType.I64, Const((ulong)signalStructPtr + rangeBaseOffset + 20));
|
Replace CacheResourceWrite with more general "precise" write (#2684)
* Replace CacheResourceWrite with more general "precise" write
The goal of CacheResourceWrite was to notify GPU resources when they were modified directly, by looking up the modified address/size in a structure and calling a method on each resource. The downside of this is that each resource cache has to be queried individually, they all have to implement their own way to do this, and it can only signal to resources using the same PhysicalMemory instance.
This PR adds the ability to signal a write as "precise" on the tracking, which signals a special handler (if present) which can be used to avoid unnecessary flush actions, or maybe even more. For buffers, precise writes specifically do not flush, and instead punch a hole in the modified range list to indicate that the data on GPU has been replaced.
The downside is that precise actions must ignore the page protection bits and always signal - as they need to notify the target resource to ignore the sequence number optimization.
I had to reintroduce the sequence number increment after I2M, as removing it was causing issues in rabbids kingdom battle. However - all resources modified by I2M are notified directly to lower their sequence number, so the problem is likely that another unrelated resource is not being properly updated. Thankfully, doing this does not affect performance in the games I tested.
This should fix regressions from #2624. Test any games that were broken by that. (RF4, rabbids kingdom battle)
I've also added a sequence number increment to ThreedClass.IncrementSyncpoint, as it seems to fix buffer corruption in OpenGL homebrew. (this was a regression from removing sequence number increment from constant buffer update - another unrelated resource thing)
* Add tests.
* Add XML docs for GpuRegionHandle
* Skip UpdateProtection if only precise actions were called
This allows precise actions to skip reprotection costs.
2021-09-28 17:27:03 -07:00
|
|
|
|
context.Call(trackingActionPtr, OperandType.I32, offset, Const(PageSize), isWrite, Const(0));
|
POWER - Performance Optimizations With Extensive Ramifications (#2286)
* Refactoring of KMemoryManager class
* Replace some trivial uses of DRAM address with VA
* Get rid of GetDramAddressFromVa
* Abstracting more operations on derived page table class
* Run auto-format on KPageTableBase
* Managed to make TryConvertVaToPa private, few uses remains now
* Implement guest physical pages ref counting, remove manual freeing
* Make DoMmuOperation private and call new abstract methods only from the base class
* Pass pages count rather than size on Map/UnmapMemory
* Change memory managers to take host pointers
* Fix a guest memory leak and simplify KPageTable
* Expose new methods for host range query and mapping
* Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists
* Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking)
* Add a SharedMemoryStorage class, will be useful for host mapping
* Sayonara AddVaRangeToPageList, you served us well
* Start to implement host memory mapping (WIP)
* Support memory tracking through host exception handling
* Fix some access violations from HLE service guest memory access and CPU
* Fix memory tracking
* Fix mapping list bugs, including a race and a error adding mapping ranges
* Simple page table for memory tracking
* Simple "volatile" region handle mode
* Update UBOs directly (experimental, rough)
* Fix the overlap check
* Only set non-modified buffers as volatile
* Fix some memory tracking issues
* Fix possible race in MapBufferFromClientProcess (block list updates were not locked)
* Write uniform update to memory immediately, only defer the buffer set.
* Fix some memory tracking issues
* Pass correct pages count on shared memory unmap
* Armeilleure Signal Handler v1 + Unix changes
Unix currently behaves like windows, rather than remapping physical
* Actually check if the host platform is unix
* Fix decommit on linux.
* Implement windows 10 placeholder shared memory, fix a buffer issue.
* Make PTC version something that will never match with master
* Remove testing variable for block count
* Add reference count for memory manager, fix dispose
Can still deadlock with OpenAL
* Add address validation, use page table for mapped check, add docs
Might clean up the page table traversing routines.
* Implement batched mapping/tracking.
* Move documentation, fix tests.
* Cleanup uniform buffer update stuff.
* Remove unnecessary assignment.
* Add unsafe host mapped memory switch
On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work.
* Remove C# exception handlers
They have issues due to current .NET limitations, so the meilleure one fully replaces them for now.
* Fix MapPhysicalMemory on the software MemoryManager.
* Null check for GetHostAddress, docs
* Add configuration for setting memory manager mode (not in UI yet)
* Add config to UI
* Fix type mismatch on Unix signal handler code emit
* Fix 6GB DRAM mode.
The size can be greater than `uint.MaxValue` when the DRAM is >4GB.
* Address some feedback.
* More detailed error if backing memory cannot be mapped.
* SetLastError on all OS functions for consistency
* Force pages dirty with UBO update instead of setting them directly.
Seems to be much faster across a few games. Need retesting.
* Rebase, configuration rework, fix mem tracking regression
* Fix race in FreePages
* Set memory managers null after decrementing ref count
* Remove readonly keyword, as this is now modified.
* Use a local variable for the signal handler rather than a register.
* Fix bug with buffer resize, and index/uniform buffer binding.
Should fix flickering in games.
* Add InvalidAccessHandler to MemoryTracking
Doesn't do anything yet
* Call invalid access handler on unmapped read/write.
Same rules as the regular memory manager.
* Make unsafe mapped memory its own MemoryManagerType
* Move FlushUboDirty into UpdateState.
* Buffer dirty cache, rather than ubo cache
Much cleaner, may be reusable for Inline2Memory updates.
* This doesn't return anything anymore.
* Add sigaction remove methods, correct a few function signatures.
* Return empty list of physical regions for size 0.
* Also on AddressSpaceManager
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-05-24 13:52:44 -07:00
|
|
|
|
|
|
|
|
|
context.Branch(endLabel);
|
|
|
|
|
|
|
|
|
|
context.MarkLabel(nextLabel);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
context.MarkLabel(endLabel);
|
|
|
|
|
|
|
|
|
|
return context.Copy(inRegionLocal);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
private static UnixExceptionHandler GenerateUnixSignalHandler(IntPtr signalStructPtr)
|
|
|
|
|
{
|
|
|
|
|
EmitterContext context = new EmitterContext();
|
|
|
|
|
|
|
|
|
|
// (int sig, SigInfo* sigInfo, void* ucontext)
|
|
|
|
|
Operand sigInfoPtr = context.LoadArgument(OperandType.I64, 1);
|
|
|
|
|
|
|
|
|
|
Operand structAddressOffset = context.Load(OperandType.I64, Const((ulong)signalStructPtr + StructAddressOffset));
|
|
|
|
|
Operand structWriteOffset = context.Load(OperandType.I64, Const((ulong)signalStructPtr + StructWriteOffset));
|
|
|
|
|
|
|
|
|
|
Operand faultAddress = context.Load(OperandType.I64, context.Add(sigInfoPtr, context.ZeroExtend32(OperandType.I64, structAddressOffset)));
|
|
|
|
|
Operand writeFlag = context.Load(OperandType.I64, context.Add(sigInfoPtr, context.ZeroExtend32(OperandType.I64, structWriteOffset)));
|
|
|
|
|
|
|
|
|
|
Operand isWrite = context.ICompareNotEqual(writeFlag, Const(0L)); // Normalize to 0/1.
|
|
|
|
|
|
|
|
|
|
Operand isInRegion = EmitGenericRegionCheck(context, signalStructPtr, faultAddress, isWrite);
|
|
|
|
|
|
|
|
|
|
Operand endLabel = Label();
|
|
|
|
|
|
|
|
|
|
context.BranchIfTrue(endLabel, isInRegion);
|
|
|
|
|
|
|
|
|
|
Operand unixOldSigaction = context.Load(OperandType.I64, Const((ulong)signalStructPtr + UnixOldSigaction));
|
|
|
|
|
Operand unixOldSigaction3Arg = context.Load(OperandType.I64, Const((ulong)signalStructPtr + UnixOldSigaction3Arg));
|
|
|
|
|
Operand threeArgLabel = Label();
|
|
|
|
|
|
|
|
|
|
context.BranchIfTrue(threeArgLabel, unixOldSigaction3Arg);
|
|
|
|
|
|
|
|
|
|
context.Call(unixOldSigaction, OperandType.None, context.LoadArgument(OperandType.I32, 0));
|
|
|
|
|
context.Branch(endLabel);
|
|
|
|
|
|
|
|
|
|
context.MarkLabel(threeArgLabel);
|
|
|
|
|
|
|
|
|
|
context.Call(unixOldSigaction,
|
|
|
|
|
OperandType.None,
|
|
|
|
|
context.LoadArgument(OperandType.I32, 0),
|
|
|
|
|
sigInfoPtr,
|
|
|
|
|
context.LoadArgument(OperandType.I64, 2)
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
context.MarkLabel(endLabel);
|
|
|
|
|
|
|
|
|
|
context.Return();
|
|
|
|
|
|
|
|
|
|
ControlFlowGraph cfg = context.GetControlFlowGraph();
|
|
|
|
|
|
|
|
|
|
OperandType[] argTypes = new OperandType[] { OperandType.I32, OperandType.I64, OperandType.I64 };
|
|
|
|
|
|
2021-09-13 16:23:37 -07:00
|
|
|
|
return Compiler.Compile(cfg, argTypes, OperandType.None, CompilerOptions.HighCq).Map<UnixExceptionHandler>();
|
POWER - Performance Optimizations With Extensive Ramifications (#2286)
* Refactoring of KMemoryManager class
* Replace some trivial uses of DRAM address with VA
* Get rid of GetDramAddressFromVa
* Abstracting more operations on derived page table class
* Run auto-format on KPageTableBase
* Managed to make TryConvertVaToPa private, few uses remains now
* Implement guest physical pages ref counting, remove manual freeing
* Make DoMmuOperation private and call new abstract methods only from the base class
* Pass pages count rather than size on Map/UnmapMemory
* Change memory managers to take host pointers
* Fix a guest memory leak and simplify KPageTable
* Expose new methods for host range query and mapping
* Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists
* Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking)
* Add a SharedMemoryStorage class, will be useful for host mapping
* Sayonara AddVaRangeToPageList, you served us well
* Start to implement host memory mapping (WIP)
* Support memory tracking through host exception handling
* Fix some access violations from HLE service guest memory access and CPU
* Fix memory tracking
* Fix mapping list bugs, including a race and a error adding mapping ranges
* Simple page table for memory tracking
* Simple "volatile" region handle mode
* Update UBOs directly (experimental, rough)
* Fix the overlap check
* Only set non-modified buffers as volatile
* Fix some memory tracking issues
* Fix possible race in MapBufferFromClientProcess (block list updates were not locked)
* Write uniform update to memory immediately, only defer the buffer set.
* Fix some memory tracking issues
* Pass correct pages count on shared memory unmap
* Armeilleure Signal Handler v1 + Unix changes
Unix currently behaves like windows, rather than remapping physical
* Actually check if the host platform is unix
* Fix decommit on linux.
* Implement windows 10 placeholder shared memory, fix a buffer issue.
* Make PTC version something that will never match with master
* Remove testing variable for block count
* Add reference count for memory manager, fix dispose
Can still deadlock with OpenAL
* Add address validation, use page table for mapped check, add docs
Might clean up the page table traversing routines.
* Implement batched mapping/tracking.
* Move documentation, fix tests.
* Cleanup uniform buffer update stuff.
* Remove unnecessary assignment.
* Add unsafe host mapped memory switch
On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work.
* Remove C# exception handlers
They have issues due to current .NET limitations, so the meilleure one fully replaces them for now.
* Fix MapPhysicalMemory on the software MemoryManager.
* Null check for GetHostAddress, docs
* Add configuration for setting memory manager mode (not in UI yet)
* Add config to UI
* Fix type mismatch on Unix signal handler code emit
* Fix 6GB DRAM mode.
The size can be greater than `uint.MaxValue` when the DRAM is >4GB.
* Address some feedback.
* More detailed error if backing memory cannot be mapped.
* SetLastError on all OS functions for consistency
* Force pages dirty with UBO update instead of setting them directly.
Seems to be much faster across a few games. Need retesting.
* Rebase, configuration rework, fix mem tracking regression
* Fix race in FreePages
* Set memory managers null after decrementing ref count
* Remove readonly keyword, as this is now modified.
* Use a local variable for the signal handler rather than a register.
* Fix bug with buffer resize, and index/uniform buffer binding.
Should fix flickering in games.
* Add InvalidAccessHandler to MemoryTracking
Doesn't do anything yet
* Call invalid access handler on unmapped read/write.
Same rules as the regular memory manager.
* Make unsafe mapped memory its own MemoryManagerType
* Move FlushUboDirty into UpdateState.
* Buffer dirty cache, rather than ubo cache
Much cleaner, may be reusable for Inline2Memory updates.
* This doesn't return anything anymore.
* Add sigaction remove methods, correct a few function signatures.
* Return empty list of physical regions for size 0.
* Also on AddressSpaceManager
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-05-24 13:52:44 -07:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
private static VectoredExceptionHandler GenerateWindowsSignalHandler(IntPtr signalStructPtr)
|
|
|
|
|
{
|
|
|
|
|
EmitterContext context = new EmitterContext();
|
|
|
|
|
|
|
|
|
|
// (ExceptionPointers* exceptionInfo)
|
|
|
|
|
Operand exceptionInfoPtr = context.LoadArgument(OperandType.I64, 0);
|
|
|
|
|
Operand exceptionRecordPtr = context.Load(OperandType.I64, exceptionInfoPtr);
|
|
|
|
|
|
|
|
|
|
// First thing's first - this catches a number of exceptions, but we only want access violations.
|
|
|
|
|
Operand validExceptionLabel = Label();
|
|
|
|
|
|
|
|
|
|
Operand exceptionCode = context.Load(OperandType.I32, exceptionRecordPtr);
|
|
|
|
|
|
|
|
|
|
context.BranchIf(validExceptionLabel, exceptionCode, Const(EXCEPTION_ACCESS_VIOLATION), Comparison.Equal);
|
|
|
|
|
|
|
|
|
|
context.Return(Const(EXCEPTION_CONTINUE_SEARCH)); // Don't handle this one.
|
|
|
|
|
|
|
|
|
|
context.MarkLabel(validExceptionLabel);
|
|
|
|
|
|
|
|
|
|
// Next, read the address of the invalid access, and whether it is a write or not.
|
|
|
|
|
|
|
|
|
|
Operand structAddressOffset = context.Load(OperandType.I32, Const((ulong)signalStructPtr + StructAddressOffset));
|
|
|
|
|
Operand structWriteOffset = context.Load(OperandType.I32, Const((ulong)signalStructPtr + StructWriteOffset));
|
|
|
|
|
|
|
|
|
|
Operand faultAddress = context.Load(OperandType.I64, context.Add(exceptionRecordPtr, context.ZeroExtend32(OperandType.I64, structAddressOffset)));
|
|
|
|
|
Operand writeFlag = context.Load(OperandType.I64, context.Add(exceptionRecordPtr, context.ZeroExtend32(OperandType.I64, structWriteOffset)));
|
|
|
|
|
|
|
|
|
|
Operand isWrite = context.ICompareNotEqual(writeFlag, Const(0L)); // Normalize to 0/1.
|
|
|
|
|
|
|
|
|
|
Operand isInRegion = EmitGenericRegionCheck(context, signalStructPtr, faultAddress, isWrite);
|
|
|
|
|
|
|
|
|
|
Operand endLabel = Label();
|
|
|
|
|
|
|
|
|
|
// If the region check result is false, then run the next vectored exception handler.
|
|
|
|
|
|
|
|
|
|
context.BranchIfTrue(endLabel, isInRegion);
|
|
|
|
|
|
|
|
|
|
context.Return(Const(EXCEPTION_CONTINUE_SEARCH));
|
|
|
|
|
|
|
|
|
|
context.MarkLabel(endLabel);
|
|
|
|
|
|
|
|
|
|
// Otherwise, return to execution.
|
|
|
|
|
|
|
|
|
|
context.Return(Const(EXCEPTION_CONTINUE_EXECUTION));
|
|
|
|
|
|
|
|
|
|
// Compile and return the function.
|
|
|
|
|
|
|
|
|
|
ControlFlowGraph cfg = context.GetControlFlowGraph();
|
|
|
|
|
|
|
|
|
|
OperandType[] argTypes = new OperandType[] { OperandType.I64 };
|
|
|
|
|
|
2021-09-13 16:23:37 -07:00
|
|
|
|
return Compiler.Compile(cfg, argTypes, OperandType.I32, CompilerOptions.HighCq).Map<VectoredExceptionHandler>();
|
POWER - Performance Optimizations With Extensive Ramifications (#2286)
* Refactoring of KMemoryManager class
* Replace some trivial uses of DRAM address with VA
* Get rid of GetDramAddressFromVa
* Abstracting more operations on derived page table class
* Run auto-format on KPageTableBase
* Managed to make TryConvertVaToPa private, few uses remains now
* Implement guest physical pages ref counting, remove manual freeing
* Make DoMmuOperation private and call new abstract methods only from the base class
* Pass pages count rather than size on Map/UnmapMemory
* Change memory managers to take host pointers
* Fix a guest memory leak and simplify KPageTable
* Expose new methods for host range query and mapping
* Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists
* Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking)
* Add a SharedMemoryStorage class, will be useful for host mapping
* Sayonara AddVaRangeToPageList, you served us well
* Start to implement host memory mapping (WIP)
* Support memory tracking through host exception handling
* Fix some access violations from HLE service guest memory access and CPU
* Fix memory tracking
* Fix mapping list bugs, including a race and a error adding mapping ranges
* Simple page table for memory tracking
* Simple "volatile" region handle mode
* Update UBOs directly (experimental, rough)
* Fix the overlap check
* Only set non-modified buffers as volatile
* Fix some memory tracking issues
* Fix possible race in MapBufferFromClientProcess (block list updates were not locked)
* Write uniform update to memory immediately, only defer the buffer set.
* Fix some memory tracking issues
* Pass correct pages count on shared memory unmap
* Armeilleure Signal Handler v1 + Unix changes
Unix currently behaves like windows, rather than remapping physical
* Actually check if the host platform is unix
* Fix decommit on linux.
* Implement windows 10 placeholder shared memory, fix a buffer issue.
* Make PTC version something that will never match with master
* Remove testing variable for block count
* Add reference count for memory manager, fix dispose
Can still deadlock with OpenAL
* Add address validation, use page table for mapped check, add docs
Might clean up the page table traversing routines.
* Implement batched mapping/tracking.
* Move documentation, fix tests.
* Cleanup uniform buffer update stuff.
* Remove unnecessary assignment.
* Add unsafe host mapped memory switch
On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work.
* Remove C# exception handlers
They have issues due to current .NET limitations, so the meilleure one fully replaces them for now.
* Fix MapPhysicalMemory on the software MemoryManager.
* Null check for GetHostAddress, docs
* Add configuration for setting memory manager mode (not in UI yet)
* Add config to UI
* Fix type mismatch on Unix signal handler code emit
* Fix 6GB DRAM mode.
The size can be greater than `uint.MaxValue` when the DRAM is >4GB.
* Address some feedback.
* More detailed error if backing memory cannot be mapped.
* SetLastError on all OS functions for consistency
* Force pages dirty with UBO update instead of setting them directly.
Seems to be much faster across a few games. Need retesting.
* Rebase, configuration rework, fix mem tracking regression
* Fix race in FreePages
* Set memory managers null after decrementing ref count
* Remove readonly keyword, as this is now modified.
* Use a local variable for the signal handler rather than a register.
* Fix bug with buffer resize, and index/uniform buffer binding.
Should fix flickering in games.
* Add InvalidAccessHandler to MemoryTracking
Doesn't do anything yet
* Call invalid access handler on unmapped read/write.
Same rules as the regular memory manager.
* Make unsafe mapped memory its own MemoryManagerType
* Move FlushUboDirty into UpdateState.
* Buffer dirty cache, rather than ubo cache
Much cleaner, may be reusable for Inline2Memory updates.
* This doesn't return anything anymore.
* Add sigaction remove methods, correct a few function signatures.
* Return empty list of physical regions for size 0.
* Also on AddressSpaceManager
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-05-24 13:52:44 -07:00
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|