KivEnt: Expanding Kivy's Support to Types Other than GLfloat

Thu 02 July 2015

Introduction - Vertex Data in GL

When GL renders an object, the data used to calculate the actual on-screen pixel comes from a few places.

  1. Uniform Data: Is the same for every vertex of every object being rendered using the same shader.

  2. Vertex Data: Vertex data varies per each vertex rendered, every single vertex of every single object will have a copy of all attributes as dictated by the vertex format.

Kivy's built in VertexFormat only has support for submitting floating point data, a decision that makes declaring a vertex format straightforward:

The default format looks like:

[
(b’v_pos’, 2, b’float’), 
(b’v_tc’, 2, b’float’),
]

We have a 2 float tuple holding position information, named v_pos, and a 2 float tuple holding texture information, named v_tc. At this time the only option for the third member is float. This allows Kivy to assume your vertex data resides in a tightly packed array of floats, and it only needs to use the count of floats to determine the position in the array of any given attribute.

Optimization Goal - Reducing Memory Usage

A huge portion of the time spent drawing a frame is spent writing out memory contents for the vertex data arrays, as these can get quite large. ES2.0 can draw up to 65,535 vertices per draw call, so for a simple vertex format representing x, y pos and u, v texture coordinates, like the one kivy uses, we could expect to write up to 1045680 (65535 (max range of unsigned short) * 4 (number of attributes) * 4 (size of float in bytes)) bytes per frame per draw call.

However, this requires that we store any per vertex information at one of the largest type sizes, increasing the amount of memory required to render certain types of visualizations that can have data packed into a smaller type. If we wanted to assign a color per vertex to be used in rendering, we would have to double the size of the submitted data as it would take 4 floats to store the color information for r, g, b, a channels.

Using Kivy's format declaration:

[
(b’v_pos’, 2, b’float’), 
(b’v_tc’, 2, b’float’),
(b'v_color', 4, b'float'),
]

We could instead store this color data as unsigned chars, which have a range of 0-255, matching the rgba colors you are probably familiar with. We want to do something like:

[
(b’v_pos’, 2, b’float’), 
(b’v_tc’, 2, b’float’),
(b'v_color', 4, b'ubyte'),
]

This would allow our vertex to take up 20 bytes of data instead of 32. However, it is not quite that simple. There is a reason Kivy currently only supports float type while technically having a field for customizing the type.

The Problem of Alignment

If we are going to mix types, we can no longer assume that our data is tightly packed into an array. If a type does not take up the full space in a machine word and would cause the next entry to sit on the boundary between 2 words, we would instead insert some extra spacing to ensure the next type begins on the next machine word. This practice is known as data alignment.

To illustrate, we have a vertex format that looks like:

[
(b’b1’, 1, b’byte’), 
(b’b2’, 1, b’byte’), 
(b’f1', 1, b'float'),
(b’f2', 1, b'float'),
]

On a 32-bit system, a machine word is 4 bytes long: this vertex format without padding would require 10 bytes of data, so 3 machine words:

Word 1 | x | x | x | x | Word 2 | x | x | x | x | Word 3 | x | x | x | x |

unaligned:

Word 1 | b1 | b2 | f1 | f1 | Word 2 | f1 | f1 | f2 | f2 | Word 3 | f2 | f2 | x | x |

Note how f1 and f2 are split between words, the cpu will now have to spend extra time to read this data, as it must look in Word 1 and Word 2 to find the value of f1. It would be much better to format our data with some extra space after the bytes so that the next value begins at the beginning of the next word:

aligned:

Word 1 | b1 | b2 | x | x | Word 2 | f1 | f1 | f1 | f1 | Word 3 | f2 | f2 | f2 | f2 |

This leaving of space is known as padding and it could be done manually if you really wanted, however you must keep in mind cpu architecture to know what word size to align too. Luckily C provides us with another method of handling this type of data: declaring a C struct. A struct will have padding automatically introduced by the compiler when necessary.

Using C Structs in Cython

In KivEnt, I have chosen to require static declaration of vertex formats using C structs. This allows us to both have a more performant rendering loop for processing the model data and game state to frame data, and solves the problems related to alignment necessary for us to interleave data of different types. The downside is we must recompile a KivEnt module making changes to rendering behavior before we can work with our new code.

We can declare a C struct like this:

ctypedef struct VertexFormat4F:
    GLfloat[2] v_pos
    GLfloat[2] v_tc

Typically we will place the declaration in a .pxd header file so that the type is accessible from other modules that may want to work with it.

We must consider one other thing to make this work though, our original algorithm for binding the vertex attributes always passes an offset equal to the size of the attribute in bytes, which means it does not account for padding. When ever we render something in GL we must tell it about the attributes it is expecting the shader to use with a call to the glVertexAttribPointer function, an invocation looks like this:

glVertexAttribPointer(attr.index, attr.size, attr.type,
    GL_FALSE, <GLsizei>self.format_size, <GLvoid*><long>offset)

The arguments we pass in are the index of the attribute as set by GL when we register the vertex attributes. The size of the attribute, that is how many values are in the array. For our v_pos and v_tc this would be 2. The third arg is the type of the attribute, here it would be GL_FLOAT.

The next bool determines whether or not the value will be normalized using the bounds of the range of its type. This can be very useful as many types of GL processing uses normalized values as inputs. For instance, both texture coordinates and colors are normalized. The following argument is the total size of a vertex, aka the sizeof your struct in KivEnt or the sum of 4*attribute_count per attribute in the Kivy approach of using floating point arrays. Finally, the last argument, the offset, is where we pass in the information about padding. We will tell GL how many bytes it takes in each struct to reach the beginning of a particular attribute. In default Kivy this is done by simply calculating how many floats have come before, but now that we must account for padding we must make use of a new calculation.

C typically provides a macro called offsetof to aid in this type of calculation. offsetof returns the count in bytes from the beginning of a struct that a particular member of the struct occurs at in memory. However, we do not have access to this macro in Cython. Fortunately, we can emulate the action of the macro, it will be a little more verbose but ultimately we will get the information we need in a portable, cross-platform way. Creating our vertex format in KivEnt requires one last step, and that is to extract the offsets for each attribute in the format struct and store them in a list much like the original Kivy declaration, but with a few extra fields.

from cython cimport Py_ssize_t
cdef extern from "Python.h":
    ctypedef int Py_intptr_t
#import the appropriate types for performing the calculation from python.

#Create a pointer to the struct type, in this case a VertexFormat4F, by 
#casting the NULL pointer to the type. This ensures that we have a pointer
#that begins at the very beginning of the data. 
cdef VertexFormat4F* tmp1 = <VertexFormat4F*>NULL
#Now, we can calculate the distance between an attribute and the start of 
#the struct like so:
pos_offset = <Py_ssize_t> (<Py_intptr_t>(tmp1.v_pos) - <Py_intptr_t>(tmp1))
#we are effectively subtracting the position in bytes of the start of our particular 
#struct from the position in bytes of the member of the struct whose location we want:
tc_offset = <Py_ssize_t> (<Py_intptr_t>(tmp1.v_tc) - <Py_intptr_t>(tmp1))

vertex_format_4f = [
    (b'v_pos', 2, b'float', pos_offset, False), 
    (b'v_tc', 2, b'float', tc_offset, False),
    ]

KivEnt introduces a boolean to control whether or not a value should be normalized in addition to the offsetof value.

Now let's declare our bytes vertex color using our new system:

from cython cimport Py_ssize_t
#We will import the format_registrar so that we can tell the game engine
#about the new format.
from kivent_core.rendering.vertex_formats cimport format_registrar
#Import the types we will need for our format from Kivy's c_opengl module.
from kivy.graphics.c_opengl cimport GLfloat, GLubyte
cdef extern from "Python.h":
    ctypedef int Py_intptr_t

ctypedef struct VertexFormat4F4UB:
    GLfloat[2] v_pos
    GLfloat[2] v_tc
    GLubyte[4] v_color

cdef VertexFormat4F4UB* tmp1 = <VertexFormat4F4UB*>NULL
pos_offset = <Py_ssize_t> (<Py_intptr_t>(tmp1.v_pos) - <Py_intptr_t>(tmp1))
tc_offset = <Py_ssize_t> (<Py_intptr_t>(tmp1.v_tc) - <Py_intptr_t>(tmp1))
color_offset = <Py_ssize_t> (<Py_intptr_t>(tmp1.v_color) - <Py_intptr_t>(tmp1))

vertex_format_4f4ub = [
    (b'pos', 2, b'float', pos_offset, False), 
    (b'v_tc', 2, b'float', tc_offset, False),
    #We normalize the color values, this will result in the val / 255 for a ubyte
    (b'v_color', 4, b'ubyte', color_offset, True),
    ]

#Finally, we must register our new format with the game engine
format_registrar.register_vertex_format('vertex_format_4f4ub', 
    vertex_format_4f4ub, sizeof(VertexFormat4F4UB))

A little verbose compared to the Kivy declaration, but with more power and performance than is possible entirely from python. Having the struct type will also come in handy when writing our rendering loop. If you're curious, you can read a more detailed tutorial explaining the render loop.

Wrapping Arbitrary Format structs in Python

While I'm perfectly happy requiring that the actual data format for a vertex be declared statically, I wanted a python object that can read and write a vertex no matter its format so that it is easy to extend KivEnt's supported formats without having to modify too many parts of the engine. The addition of our extra offset data actually allows us to accomplish this easily. First, we will declare an object that will take a void pointer to the location in memory of its associated struct, and a dictionary form of the vertex format list that has keys of the attribute name, values of the remaining members of the attribute tuple.

#Import all the GL types.
from kivy.graphics.c_opengl cimport (GLfloat, GLbyte, GLubyte, GLint, GLuint,
GLshort, GLushort)

cdef class Vertex:
    cdef dict vertex_format
    cdef void* vertex_pointer

    def __cinit__(self, dict format):
        self.vertex_format = format

Note that we still must set the pointer to a specific location before using our Vertex object, or we may have unintended results. The VertexModel class does this for you, but if you are ever using a Vertex manually, you must do something like:

v = Vertex(format_dict)
v.vertex_pointer = <void*>yourstruct
#now it's safe to use.
v.v_pos = (1., 1.)

We will write a custom __getattr__ so that we can retrieve the data of a certain attribute from the array:

def __getattr__(self, name):
    #These values will be looked up inside the format dict
    cdef int count
    cdef unsigned int offset
    cdef bytes attr_type
    #We cast our pointer to a char* pointer so that we can index into it using bytes.
    cdef char* data = <char*>self.vertex_pointer
    #We must forward declare pointers to arrays of each GL type we can use:
    cdef GLfloat* f_data
    cdef GLint* i_data
    cdef GLuint* ui_data
    cdef GLshort* s_data
    cdef GLushort* us_data
    cdef GLbyte* b_data
    cdef GLubyte* ub_data
    #If we are using py3, we need to cast our name str to bytes as GL cannot
    #use unicode attr names.
    if isinstance(name, unicode):
        name = bytes(name, 'utf-8')
    if name in self.vertex_format:
        #If we find the name in the format, retrieve the count, type, and offset from the formact_dict
        attribute_tuple = self.vertex_format[name]
        count = attribute_tuple[0]
        attr_type = attribute_tuple[1]
        offset = attribute_tuple[2]
        #Check the type to use
        if attr_type == b'float':
            #Index into the char array by the offset in bytes, and then cast to the appropriate type
            f_data = <GLfloat*>&data[offset]
            #Retrieve the counter number of values, casting from a GL type to a normal C type, which 
            #will then be converted to a python type automatically by Cython.
            ret = [<float>f_data[x] for x in range(count)]
        #Now we just do the same thing for all the other types.
        elif attr_type == b'int':
            i_data = <GLint*>&data[offset]
            ret = [<int>i_data[x] for x in range(count)]
        elif attr_type == b'uint':
            ui_data = <GLuint*>&data[offset]
            ret = [<unsigned int>ui_data[x] for x in range(count)]
        elif attr_type == b'short':
            s_data = <GLshort*>&data[offset]
            ret = [<short>s_data[x] for x in range(count)]
        elif attr_type == b'ushort':
            us_data = <GLushort*>&data[offset]
            ret = [<unsigned short>us_data[x] for x in range(count)]
        elif attr_type == b'byte':
            b_data = <GLbyte*>&data[offset]
            ret = [<char>b_data[x] for x in range(count)]
        elif attr_type == b'ubyte':
            ub_data = <GLubyte*>&data[offset]
            ret = [<unsigned char>ub_data[x] for x in range(count)]
        else:
            #Raise a TypeError if the attr_type is not one of the available.
            raise TypeError()
        #If our return has only one value, lets return it instead a list of length 1
        if len(ret) == 1:
            return ret[0]
        else:
            return ret
    else:
        #Raise an attribute error if the name is not in the format.
        raise AttributeError()

A little verbose as we must explicitly deal with each of the types, but nothing too arduous. Keep in mind we will always return a copy of the data, not the data itself. This means you can't keep around or modify in place the returned lists and expect the underlying data to change. We must implement a custom __setattr__ to modify the data in the underlying memory:

def __setattr__(self, name, value):
    #Like before, predeclare our format tuple values
    cdef int count
    cdef unsigned int offset
    cdef bytes attr_type
    #Cast the data pointer to a char pointer
    cdef char* data = <char*>self.vertex_pointer
    #Predeclare the typed array pointers
    cdef GLfloat* f_data
    cdef GLint* i_data
    cdef GLuint* ui_data
    cdef GLshort* s_data
    cdef GLushort* us_data
    cdef GLbyte* b_data
    cdef GLubyte* ub_data
    #If the data is a tuple cast it to a list
    if isinstance(value, tuple):
        value = list(value)
    #If the data is a single value, turn it into a list for algorithmic simplicity
    if not isinstance(value, list):
        value = [value]
    #If the name is unicode cast to bytes
    if isinstance(name, unicode):
        name = bytes(name, 'utf-8')
    if name in self.vertex_format:
        #Check if the name is in the format and get the other values
        attribute_tuple = self.vertex_format[name]
        count = attribute_tuple[0]
        attr_type = attribute_tuple[1]
        offset = attribute_tuple[2]
        #If the setting data is not the right size raise an exception.
        if len(value) != count:
            raise AttributeCountError('Expected list of length {count} got'
                'list of size {length}'.format(count=count, 
                length=len(value)))
        for x in range(count):
            if attr_type == b'float':
                #Index into the memory by the offset + the number of bytes taken up by 
                #previous values if we have an array, casting to the appropriate type.
                f_data = <GLfloat*>&data[offset + x*sizeof(GLfloat)]
                #Cast the setting value to the appropriate type and assign it to the location in memory.
                f_data[0] = <GLfloat>value[x]
            #Do the same for all the other types.
            elif attr_type == b'int':
                i_data = <GLint*>&data[offset + x*sizeof(GLint)]
                i_data[0] = <GLint>value[x]
            elif attr_type == b'uint':
                ui_data = <GLuint*>&data[offset + x*sizeof(GLuint)]
                ui_data[0] = <GLuint>value[x]
            elif attr_type == b'short':
                s_data = <GLshort*>&data[offset + x*sizeof(GLshort)]
                s_data[0] = <GLshort>value[x]
            elif attr_type == b'ushort':
                us_data = <GLushort*>&data[offset + x*sizeof(GLushort)]
                us_data[0] = <GLushort>value[x]
            elif attr_type == b'byte':
                b_data = <GLbyte*>&data[offset + x*sizeof(GLbyte)]
                b_data[0] = <GLbyte>value[x]
            elif attr_type == b'ubyte':
                ub_data = <GLubyte*>&data[offset + x*sizeof(GLubyte)]
                ub_data[0] = <GLubyte>value[x]
            else:
                #Raise error if type is unhandled.
                raise TypeError()
    else:
        #Raise error if name isn't in format.
        raise AttributeError()

Thus the final piece is in place, we can now declare C structs that describe our vertex data and hold all our rendering information in them to create efficient and performant rendering loops in Cython, while also being able to access all the data from Python using the Vertex class.

blogroll