python源码学习(七)——String对象

By | 2014/07/05

有了前面对Python中整数对象的学习,再学习String对象就回容易理解些,首先我们来看一下Python中的PyStringObject和PySting_Type,PyStringObject.h对象的定义如下:

typedef struct {
    PyObject_VAR_HEAD
    long ob_shash;
    int ob_sstate;
    char ob_sval[1];

    /* Invariants:
     *     ob_sval contains space for 'ob_size+1' elements.
     *     ob_sval[ob_size] == 0.
     *     ob_shash is the hash of the string or -1 if not computed yet.
     *     ob_sstate != 0 iff the string object is in stringobject.c's
     *       'interned' dictionary; in this case the two references
     *       from 'interned' to this object are *not counted* in ob_refcnt.
     */
} PyStringObject;

PyObject_VAR_HEAD中有个ob_size变量,字符串的长度由ob_size决定,ob_size决定了这段内存的实际长度(字节),这个机制是Python中所有变长对象的实现机制。Python中字符串都是以结尾。ob_shash的初始值是-1,这个在Python的dict中有非常巨大的作用。hash值得算法如下:

static long
string_hash(PyStringObject *a)
{
    register Py_ssize_t len;
    register unsigned char *p;
    register long x;

#ifdef Py_DEBUG
    assert(_Py_HashSecret_Initialized);
#endif
    if (a->ob_shash != -1)
        return a->ob_shash;
    len = Py_SIZE(a);
    /*
      We make the hash of the empty string be 0, rather than using
      (prefix ^ suffix), since this slightly obfuscates the hash secret
    */
    if (len == 0) {
        a->ob_shash = 0;
        return 0;
    }
    p = (unsigned char *) a->ob_sval;
    x = _Py_HashSecret.prefix;
    x ^= *p << 7;
    while (--len >= 0)
        x = (1000003*x) ^ *p++;
    x ^= Py_SIZE(a);
    x ^= _Py_HashSecret.suffix;
    if (x == -1)
        x = -2;
    a->ob_shash = x;
    return x;
}

字符串的预存hash值和intern机制使得Python的执行效率提升了20%左右。再看一下PyString_Type,代码如下:

PyTypeObject PyString_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "str",
    PyStringObject_SIZE,
    sizeof(char),
    string_dealloc,                             /* tp_dealloc */
    (printfunc)string_print,                    /* tp_print */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    0,                                          /* tp_compare */
    string_repr,                                /* tp_repr */
    &string_as_number,                          /* tp_as_number */
    &string_as_sequence,                        /* tp_as_sequence */
    &string_as_mapping,                         /* tp_as_mapping */
    (hashfunc)string_hash,                      /* tp_hash */
    0,                                          /* tp_call */
    string_str,                                 /* tp_str */
    PyObject_GenericGetAttr,                    /* tp_getattro */
    0,                                          /* tp_setattro */
    &string_as_buffer,                          /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_CHECKTYPES |
        Py_TPFLAGS_BASETYPE | Py_TPFLAGS_STRING_SUBCLASS |
        Py_TPFLAGS_HAVE_NEWBUFFER,              /* tp_flags */
    string_doc,                                 /* tp_doc */
    0,                                          /* tp_traverse */
    0,                                          /* tp_clear */
    (richcmpfunc)string_richcompare,            /* tp_richcompare */
    0,                                          /* tp_weaklistoffset */
    0,                                          /* tp_iter */
    0,                                          /* tp_iternext */
    string_methods,                             /* tp_methods */
    0,                                          /* tp_members */
    0,                                          /* tp_getset */
    &PyBaseString_Type,                         /* tp_base */
    0,                                          /* tp_dict */
    0,                                          /* tp_descr_get */
    0,                                          /* tp_descr_set */
    0,                                          /* tp_dictoffset */
    0,                                          /* tp_init */
    0,                                          /* tp_alloc */
    string_new,                                 /* tp_new */
    PyObject_Del,                               /* tp_free */
};

从上面的代码我们不难看出,String对象的tp_as_number、tp_as_sequence、tp_as_mapping三个域都被设置了,这说明了PyStringObject对数值操作
、序列操作和映射操作都支持。接下来我们看一下图片:
<img src="http://www.androiddev.net/wp-content/uploads/2014/07/Screen-Shot-2014-07-05-at-下午3.55 task management online.57.png” alt=”Screen Shot 2014-07-05 at 下午3.55.57″ width=”656″ height=”336″ class=”alignnone size-full wp-image-914″ srcset=”https://www.androiddev.net/wp-content/uploads/2014/07/Screen-Shot-2014-07-05-at-下午3.55.57.png 656w, https://www.androiddev.net/wp-content/uploads/2014/07/Screen-Shot-2014-07-05-at-下午3.55.57-300×153.png 300w” sizes=”(max-width: 656px) 100vw, 656px” />

Screen Shot 2014-07-05 at 下午5.58.32
好了,通过以上两张图片我们就可以比较清楚的了解Python中string对象运行时的状态以及所占内存了,以后将会讲一个非常重要的机制,就是string的intern机制。

发表评论

电子邮件地址不会被公开。 必填项已用*标注

This site uses Akismet to reduce spam. Learn how your comment data is processed.