## Tuesday, 21 February 2012

### A bit of Parsing

In yesterdays post I taked about some updates to the ShaderLib to allow registering of the uniforms, and as I was driving into work I decided to add some more code to parse the shader source and look for any lines containing the word uniform and automatically add it to the uniforms list.

The glsl spec contains a full list of the grammar for glsl, and we can break down the declaration of uniforms to the following
// easy case
// uniform type name;
uniform vec3 position;

// more difficult case
// uniform type name[ ];

#define numLights 10
uniform Lights lights[numLights];

//or

uniform Lights lights[3];


The simple case of a single uniform is easy to parse and can be accomplished by reading the shader source and looking for lines containing the uniform keyword.

The array version however is more complex as we need to determine if we have one of two different cases. Either a hard coded array value or a constant defined by a #define statement.

The lucky thing is that the #define must always be executed before it is used, so we can store these values and do a lookup on them.

The downside of this is the fact that each element of the array must be registered.

This is further compounded by the fact that the data type may be a structure. I'm not going to worry about the structure side of things and only worry with accessing normal arrays of the type uniform vec3 points[3] for example.

Lets get a boost++
In general C++ string processing is not that good, however there are several really good libraries in boost that allow loads of different string processing. For this example I'm going to use boost::tokenizer ,boost::formatboost::split and boost::lexical_cast, all of these are header only templated classes so don't require external compiled libs and all work well with the usual stl containers.

Overview of the process
I've created a new method for the ShaderProgram class which will loop for each attached shader and grab the shader source code string (the whole shader is stored as a single std::string ) this is then split based on the tokens \n\r and a space.
Each line is then searched to find the following keywords "uniform" and "#define". The code to do this is as follows.
void ShaderProgram::autoRegisterUniforms()
{

const std::string *source;
std::vector<std::string> lines;

boost::char_separator<char> sep(" \t\r\n");
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;

for(unsigned int i=0; i<size; ++i)
{
/// first grab all of the shader source for this program
// and split on new lines
boost::split(lines, *source, boost::is_any_of("\n\r"));

// now we loop for the strings and tokenize looking for the uniform keyword
// or the #define keyword
std::vector<std::string>::iterator start=lines.begin();
std::vector<std::string>::iterator end=lines.end();
std::map<std::string,int> defines;

while(start!=end)
{
if(start->find("#define") !=std::string::npos)
{
int value;
std::string define;
if( parseHashDefine(*start,define,value) )
{
defines[define]=value;
}
}
// see if we have uniform in the string
else if (start->find("uniform") !=std::string::npos)
{
parseUniform(*start,defines);
}
// got to the next line
++start;
}
}
}

To find if the string contains our keywords we use the std::string find method which will return a value not equal to std::string::npos if found (basically the index but we don't need that).

If this is found we can process the respective values.
Parsing #define
To store any #define values I'm going to use a std::map<std::string, int>  to store the value, and also make the assumption that we will always have the form #define name [int]. This is a very safe assumption as array subscripts must always be positive and in the case of glsl most implementations only allow very small arrays.

The overall process is quite simple, I tokenise the string as above, however now we copy this to a std::vector <std::string>  using the assign method and then use the subscript to access the elements we want, as shown below

bool ShaderProgram::parseHashDefine(
const std::string &_s,
std::string &o_name,
int &o_value
) const
{
// typedef our tokenizer for speed and clarity
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
// these are the separators we are looking for (not droids ;-)
boost::char_separator<char> sep(" \t\r\n");
// generate our tokens based on the separators above
tokenizer tokens(_s, sep);
// now we will copy them into a std::vector to process
std::vector <std::string> data;
// we do this as the tokenizer just does that so we can't get size etc
data.assign(tokens.begin(),tokens.end());
// we are parsing #define name value so check we have this format
// we should as the glsl compiler will fail if we don't but best to be sure
if(data.size() !=3)
{
return false;
}
else
{
//            data [0]     [1]   [2]
// we are parsing #define name value
o_name=data[1];
o_value=boost::lexical_cast<int> (data[2]);
// all was good so return true
return true;
}
}

You will notice that the boost::lexical_cast is used to convert the string into an integer, and that both the values are returned to the main program and stored in the defines map ready to be passed into the next function to process the uniforms.
ParseUniform method
The parseUniform method is split into two sections, we first check to see if the string has a [ or not as this will depend upon the parse method. If no [] is present we can process in a similar way to above, however if there is an array we need to further refine the parse, and then loop and register all the uniforms. For example the following glsl code
#define numIndices 4
int indices[numIndices];

would need all the uniforms registered so we would register
indices[0]
indices[1]
indices[2]
indices[3]

The following code shows the complete method
void ShaderProgram::parseUniform(
const std::string &_s,
const std::map <std::string,int> &_defines
)
{
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;

// first lets see what we need to parse
if(_s.find("[") ==std::string::npos)
{
boost::char_separator<char> sep(" \t\r\n;");
// generate our tokens based on the seperators above
tokenizer tokens(_s, sep);
// now we will copy them into a std::vector to process
std::vector <std::string> data;
// we do this as the tokenizer just does that so we can't get size etc
data.assign(tokens.begin(),tokens.end());
// uniform type name
// we should as the glsl compiler will fail if we don't but best to be sure
if(data.size() >=3)
{
registerUniform(data[2]);
}
}
else
{
boost::char_separator<char> sep(" []\t\r\n;");
// generate our tokens based on the separators above
tokenizer tokens(_s, sep);
// now we will copy them into a std::vector to process
std::vector <std::string> data;
// we do this as the tokenizer just does that so we can't get size etc
data.assign(tokens.begin(),tokens.end());
// uniform type name
// we should as the glsl compiler will fail if we don't but best to be sure
if(data.size() >=3)
{
// so in this case data[3] is either a number or a constant
int arraySize=0;
// so we try and convert it if it's not a number
try
{
arraySize=boost::lexical_cast<int> (data[3]);
}
// catch and lookup in the uniform array
{
std::map <std::string, int >::const_iterator def=_defines.find(data[3]);
if(def !=_defines.end())
{
arraySize=def->second;
}
} // end catch
// now loop and register each of the uniforms
for(int i=0; i<arraySize; ++i)
{
// convert our uniform and register
std::string uniform=boost::str(boost::format("%s[%d]") %data[2] %i);
registerUniform(uniform);
}
}

}
}

Most of the code is similar to the other parse methods, however we use the fact that boost::lexical_cast throws and exception to check to see if we have a number or a string as the size of the array, first I try the cast, if this throws a boost::bad_lexical_cast exception we look to see if we have the value stored in our defines list.
Conclusions and Future work
This works quite well and is fairly robust (as the glsl compiler will fail if the code is wrong anyway), however as mentioned above it doesn't cope with registering uniforms which have structures. I may refine this at a later date to cope with these but it would require storing each structure and element of that structure and registering each element.

This will eventually be made redundant due to the use of the new glsl Uniform Buffer Objects however at present my mac doesn't support them so will have to wait.

I may also just write a full glsl parser using the excellent boost::spirit library one day as an exercise in  parser writing (as I quite enjoy writing parsers)

This code has been integrated into ngl and the latest source tree has been updated to inclue it all.