The original code, removed in the "rewrite" patch, was incorrect for
large structures, and required dynamic allocation of a trampoline on
every ffi_call.
Instead, allocate a 4k entry table of all possible structure returns.
The table is 80k, but is read-only and dynamically paged, which ought
to be better than allocating the trampoline.
This is difficult to test with gcc. One can only use -O0 at present.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63668.
It's impossible to call between v8 and v9 ABIs, because of the stack bias
in the v9 ABI. So let's not pretend it's just not implemented yet. Split
the v9 code out to a separate file.
The register windows prevent ffi_call from setting up the entire stack
frame the assembly, but we needn't make an indirect call back to prep_args.
Assembly to use local labels, .type annotation, hidden annotation.
I do retain the _prefix for the symbols, but given that it wasn't
done consistently across all symbols, I doubt it's actually needed.
Do not modify the ffi_type. Rearrange the tests so that we
quickly eliminate structures that cannot match. Return an
encoded value of element count and base type.
Unties the backend from changes to FFI_TYPE_* constants, and allows
compilation to succeed after the addition of FFI_TYPE_COMPLEX.
Delete the hand-written unwind info.
Avoid false abstraction, like get_x_addr. Avoid recomputing data
about the type being manipulated. Use NEON insns for HFA manipulation.
Note that some of the inline assembly will go away in a subsequent patch.
The iOS abi doesn't require padding between arguments, but
that's not what AARCH64_STACK_ALIGN meant. The hardware will
in fact trap if the SP register is not 16 byte aligned.
The set of functions get_homogeneous_type, element_count, and is_hfa
are all intertwined and recompute data. Return a compound quantity
from is_hfa that contains all the data and avoids the recomputation.
Move everything into sysv.S, removing win32.S and freebsd.S.
Handle all abis with a single ffi_closure_inner function.
Move complexity of the raw THISCALL trampoline into assembly
instead of the trampoline itself.
Only push the context for the REGISTER abi; let the rest
receive it in a register.
Decouple the assembly from FFI_TYPE_*. Merge prep_args with ffi_call,
passing the frame and the stack to the assembly.
Note that this patch isn't really standalone, as this breaks closures.