前言

接下来的几篇文章对Builtin做专题讲解。Builtin实现了V8中大量的核心功能,可见它的重要性。但大多数的Builtin采用CAS和TQ实现,CAS和TQ与汇编类似,这给我们阅读源码带来了不少困难,更难的是无法在V8运行期间调试Builtin,这让学习Builtin愈加困难。因此,本专题将详细讲解Builtin的学习方法和调试方法,希望能起到抛砖引玉的作用。

摘要

本篇文章是Builtin专题的第一篇,讲解Built-in Functions(Builtin)是什么,以及它的初始化。Built-in Functions(Builtin)作为V8的内建功能,实现了很多重要功能,例如ignition、bytecode handler、JavaScript API。因此学会Builtin有助于理解V8的执行逻辑,例如可以看到bytecode是怎么执行的、字符串的substring方法是怎么实现的。本文主要内容介绍Builtin的实现方法(章节2);Builtin初始化(章节3)。

Builtin的实现方法

Builtin的实现方法有Platform-dependent assembly language、C++、JavaScript、CodeStubAssembler和Torque,这五种方式在使用的难易度和性能方面有明显不同。引用官方(v8.dev/docs/torque)内容如下:

(1) Platform-dependent assembly language: can be highly efficient, but need manual ports to all platforms and are difficult to maintain.

(2) C++: very similar in style to runtime functions and have access to V8’s powerful runtime functionality, but usually not suited to performance-sensitive areas.

(3) JavaScript: concise and readable code, access to fast intrinsics, but frequent usage of slow runtime calls, subject to unpredictable performance through type pollution, and subtle issues around (complicated and non-obvious) JS semantics. Javascript builtins are deprecated and should not be added anymore.

(4) CodeStubAssembler: provides efficient low-level functionality that is very close to assembly language while remaining platform-independent and preserving readability.

(5) V8 Torque: is a V8-specific domain-specific language that is translated to CodeStubAssembler. As such, it extends upon CodeStubAssembler and offers static typing as well as readable and expressive syntax.

Torque是CodeStubAssembler的改进版,强调在不损失性能的前提下尽量降低使用难度,让Builtin的开发更加容易一些。

图1(来自官方)说明了使用Torque创建Builtin的过程。

首先,开发者编写的file.tq被Torque编译器翻译为-tq-csa.cc/.h文件;

其次,-tq-csa.cc/.h被编译进可执行文件mksnapshot中;

最后,mksnapshot生成snapshot.bin文件,该文件存储Builtin的二进制序列。

再次强调: *-tq-csa.cc/.h是由file.tq指导Torque编译器生成的Builtin源码。

V8通过反序列化方式加载snapshot文件时没有符号表,所以调试V8源码时不能看到Torque Builtin源码,CodeStubAssembler Builtin也存储在snapshot.bin文件中,所以调试时也看不到源码。调试方法请参见mksnapshot,下面讲解我的调试方法。

Builtin初始化

讲解源码之前先说注意事项,调试方法采用7.9版本和v8_use_snapshot选项,因为新版本不再支持v8_use_snapshot = false,无法调试Builtin的初始化。v8_use_snapshot = false会禁用snapshot.bin文件,这就意味着V8启动时会使用C++源码创建和初始化Builtin,而这正是我们想要看的内容。

我认为C++、CodeStubAssembler和Torque三种Builtin最重要,因为ignition、bytecode handler、Javascript API等核心功能基本由这三种Builtin实现,下面对这三种Builtin做详细说明。Builtin的初始化入口代码如下:

bool Isolate::InitWithoutSnapshot() { return Init(nullptr, nullptr); }

从InitWithoutSnapshot()函数的名字也可看出禁用了snapshot.bin文件,InitWithoutSnapshot()函数执行以下代码:

1.  bool Isolate::Init(ReadOnlyDeserializer* read_only_deserializer,2.                     StartupDeserializer* startup_deserializer) {3.  //..............省略...............4.    bootstrapper_->Initialize(create_heap_objects);5.    if (FLAG_embedded_builtins && create_heap_objects) {6.      builtins_constants_table_builder_ = new BuiltinsConstantsTableBuilder(this);7.    }8.    setup_delegate_->SetupBuiltins(this);9.    if (FLAG_embedded_builtins && create_heap_objects) {10.      builtins_constants_table_builder_->Finalize();11.      delete builtins_constants_table_builder_;12.      builtins_constants_table_builder_ = nullptr;13.      CreateAndSetEmbeddedBlob();14.    }15.//..............省略...............16.    return true;17.  }

上述第8行代码进入SetupBuiltins(),在SetupBuiltins()中调用SetupBuiltinsInternal()以完成Builtin的初始化。SetupBuiltinsInternal()的源码如下:

1.  void SetupIsolateDelegate::SetupBuiltinsInternal(Isolate* isolate) {2.    Builtins* builtins = isolate->builtins();3.  //省略...................4.    int index = 0;5.    Code code;6.  #define BUILD_CPP(Name)                                                      \7.    code = BuildAdaptor(isolate, index, FUNCTION_ADDR(Builtin_##Name), #Name); \8.    AddBuiltin(builtins, index++, code);9.  #define BUILD_TFJ(Name, Argc, ...)                              \10.    code = BuildWithCodeStubAssemblerJS(                          \11.        isolate, index, &Builtins::Generate_##Name, Argc, #Name); \12.    AddBuiltin(builtins, index++, code);13.  #define BUILD_TFC(Name, InterfaceDescriptor)                      \14.    /* Return size is from the provided CallInterfaceDescriptor. */ \15.    code = BuildWithCodeStubAssemblerCS(                            \16.        isolate, index, &Builtins::Generate_##Name,                 \17.        CallDescriptors::InterfaceDescriptor, #Name);               \18.    AddBuiltin(builtins, index++, code);19.  #define BUILD_TFS(Name, ...)                                                   \20.    /* Return size for generic TF builtins (stub linkage) is always 1. */        \21.    code =                                                                       \22.        BuildWithCodeStubAssemblerCS(isolate, index, &Builtins::Generate_##Name, \23.                                     CallDescriptors::Name, #Name);              \24.    AddBuiltin(builtins, index++, code);25.  #define BUILD_TFH(Name, InterfaceDescriptor)              \26.    /* Return size for IC builtins/handlers is always 1. */ \27.    code = BuildWithCodeStubAssemblerCS(                    \28.        isolate, index, &Builtins::Generate_##Name,         \29.        CallDescriptors::InterfaceDescriptor, #Name);       \30.    AddBuiltin(builtins, index++, code);31.  #define BUILD_BCH(Name, OperandScale, Bytecode)                           \32.    code = GenerateBytecodeHandler(isolate, index, OperandScale, Bytecode); \33.    AddBuiltin(builtins, index++, code);34.  #define BUILD_ASM(Name, InterfaceDescriptor)                                \35.    code = BuildWithMacroAssembler(isolate, index, Builtins::Generate_##Name, \36.                                   #Name);                                    \37.    AddBuiltin(builtins, index++, code);38.    BUILTIN_LIST(BUILD_CPP, BUILD_TFJ, BUILD_TFC, BUILD_TFS, BUILD_TFH, BUILD_BCH,39.                 BUILD_ASM);40.  //省略...........................41.  }

SetupBuiltinsInternal()的三大核心功能解释如下:

(1) BUILD_CPP, BUILD_TFJ, BUILD_TFC, BUILD_TFS, BUILD_TFH, BUILD_BCH和BUILD_ASM从功能上对Builtin做了区分,注释如下:

// CPP: Builtin in C++. Entered via BUILTIN_EXIT frame.//      Args: name// TFJ: Builtin in Turbofan, with JS linkage (callable as Javascript function).//      Args: name, arguments count, explicit argument names...// TFS: Builtin in Turbofan, with CodeStub linkage.//      Args: name, explicit argument names...// TFC: Builtin in Turbofan, with CodeStub linkage and custom descriptor.//      Args: name, interface descriptor// TFH: Handlers in Turbofan, with CodeStub linkage.//      Args: name, interface descriptor// BCH: Bytecode Handlers, with bytecode dispatch linkage.//      Args: name, OperandScale, Bytecode// ASM: Builtin in platform-dependent assembly.//      Args: name, interface descriptor

(2) SetupBuiltinsInternal()的第38行代码BUILTIN_LIST定义了所有的Builtin,源码如下:

1.  #define BUILTIN_LIST(CPP, TFJ, TFC, TFS, TFH, BCH, ASM)  \2.    BUILTIN_LIST_BASE(CPP, TFJ, TFC, TFS, TFH, ASM)        \3.    BUILTIN_LIST_FROM_TORQUE(CPP, TFJ, TFC, TFS, TFH, ASM) \4.    BUILTIN_LIST_INTL(CPP, TFJ, TFS)                       \5.    BUILTIN_LIST_BYTECODE_HANDLERS(BCH)6.  //================分隔线=================================7.  #define BUILTIN_LIST_FROM_TORQUE(CPP, TFJ, TFC, TFS, TFH, ASM) \8.  //...............省略............................9.  TFJ(StringPrototypeToString, 0, kReceiver) \10.  TFJ(StringPrototypeValueOf, 0, kReceiver) \11.  TFS(StringToList, kString) \12.  TFJ(StringPrototypeCharAt, 1, kReceiver, kPosition) \13.  TFJ(StringPrototypeCharCodeAt, 1, kReceiver, kPosition) \14.  TFJ(StringPrototypeCodePointAt, 1, kReceiver, kPosition) \15.  TFJ(StringPrototypeConcat, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \16.  TFJ(StringConstructor, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \17.  TFS(StringAddConvertLeft, kLeft, kRight) \18.  TFS(StringAddConvertRight, kLeft, kRight) \19.  TFJ(StringPrototypeEndsWith, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \20.  TFS(CreateHTML, kReceiver, kMethodName, kTagName, kAttr, kAttrValue) \21.  TFJ(StringPrototypeAnchor, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \22.  TFJ(StringPrototypeBig, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \23.  TFJ(StringPrototypeIterator, 0, kReceiver) \24.  TFJ(StringIteratorPrototypeNext, 0, kReceiver) \25.  TFJ(StringPrototypePadStart, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \26.  TFJ(StringPrototypePadEnd, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \27.  TFS(StringRepeat, kString, kCount) \28.  TFJ(StringPrototypeRepeat, 1, kReceiver, kCount) \29.  TFJ(StringPrototypeSlice, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \30.  TFJ(StringPrototypeStartsWith, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \31.  TFJ(StringPrototypeSubstring, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \

BUILTIN_LIST和BUILTIN_LIST_FROM_TORQUE配合使用可以看到所有的Builtin名字,第9-31行代码可以看到实现字符串方法的Builtin的名字,例如substring的Builtin是StringPrototypeSubstring。

(3) BUILD_CPP, BUILD_TFJ等七个宏和BUILTIN_LIST的共同配合完成所有Builtin的初始化。以SetupBuiltinsInternal()的BUILD_CPP为例进一步分析,源码如下:

1.    int index = 0;2.    Code code;3.  #define BUILD_CPP(Name)                                                      \4.    code = BuildAdaptor(isolate, index, FUNCTION_ADDR(Builtin_##Name), #Name); \5.    AddBuiltin(builtins, index++, code);//...................分隔线.................// FUNCTION_ADDR(f) gets the address of a C function f.#define FUNCTION_ADDR(f) (reinterpret_cast(f))

index的初始值为0,code是一个基于HeapObject的地址指针,用于保存生成的Builtin地址。FUNCTION_ADDR(Builtin_##Name)创建Builtin的地址指针,在BuildAdaptor()中完成Builtin的创建时会使用该指针。BuildAdaptor()的源码如下:

Code BuildAdaptor(Isolate* isolate, int32_t builtin_index,                  Address builtin_address, const char* name) {  HandleScope scope(isolate);  // Canonicalize handles, so that we can share constant pool entries pointing  // to code targets without dereferencing their handles.  CanonicalHandleScope canonical(isolate);  constexpr int kBufferSize = 32 * KB;  byte buffer[kBufferSize];  MacroAssembler masm(isolate, BuiltinAssemblerOptions(isolate, builtin_index),                      CodeObjectRequired::kYes,                      ExternalAssemblerBuffer(buffer, kBufferSize));  masm.set_builtin_index(builtin_index);  DCHECK(!masm.has_frame());  Builtins::Generate_Adaptor(&masm, builtin_address);  CodeDesc desc;  masm.GetCode(isolate, &desc);  Handle code = Factory::CodeBuilder(isolate, desc, Code::BUILTIN)                          .set_self_reference(masm.CodeObject())                          .set_builtin_index(builtin_index)                          .Build();  return *code;}

上述代码中,通过Generate_Adaptor和Factory::CodeBuilder完成Builtin的创建,code表示Builtin的地址。

返回到#define BUILD_CPP(Name),进入AddBuiltin,源码如下:

void SetupIsolateDelegate::AddBuiltin(Builtins* builtins, int index,                                      Code code) {  DCHECK_EQ(index, code.builtin_index());  builtins->set_builtin(index, code);}//..............分隔线.......................void Builtins::set_builtin(int index, Code builtin) {  isolate_->heap()->set_builtin(index, builtin);}//.............分隔线..........................void Heap::set_builtin(int index, Code builtin) {  DCHECK(Builtins::IsBuiltinId(index));  DCHECK(Internals::HasHeapObjectTag(builtin.ptr()));  // The given builtin may be completely uninitialized thus we cannot check its  // type here.  isolate()->builtins_table()[index] = builtin.ptr();}

上述代码中,Builtins::set_builtin()调用Heap::set_builtin()把Builtin存储到isolate()->builtins_table()中。builtin_table是V8_INLINE Address*类型的数组,index是数组下标,该数组存储了所有的Builtin。至此,Builtin初始化完成,图2是函数调用堆栈。

Buitlin的调试方法总结如下:

(1) 把BUILTIN_LIST宏展开,得到每个Builtin的编号index。可以借助VS2019的预处理来展开宏。

(2) 使用index设置条件断点,图3展示了跟踪12号Builtin的方法。

在Builtin的源码下断点是最简单直接的方法,如果你不知道Builtin是用哪种方式实现的(如BUILD_CPP或BUILD_TFS),那就在每个方法中都设置条件断点。图4中是在Substring源码中下的断点。

技术总结

(1) 调试Bultin时要使用7.x版的V8,高版本中已经没有v8_use_snapshot了;

(2) 编译V8时需要设置v8_optimized_debug = false,关闭compiler optimizations;

(3) 因为builtin_index是int32_t,设置条件断点时要用使用(int)builtin_index。

好了,今天到这里,下次见。

个人能力有限,有不足与纰漏,欢迎批评指正

微信:qq9123013 备注:v8交流 邮箱:v8blink@outlook.com