Using the Lotame Admin API to build a Dashboard – Part 1

Introduction

We here at Lotame like to think we have a pretty nice UI, and we love when our customers make good use of it. However, we also recognize that we are not able to anticipate every business need display information in the exact way that each organization would prefer. Perhaps your business processes dictate that you need to see the data a certain way, or maybe you have an executive level dashboard that provides your CEO a snapshot of how the business is performing without requiring him or her to log into multiple individual applications to see those details.

Enter the Lotame Admin API. Every feature in our Crowd Control UI leverages this ReST based API. That means everything that you can do and every piece of data that you can access in our UI is also available via the Admin API. There are a number of APIs that customers can use, including the Audience Extraction API and Data Collection APIs. However, from here on out we will be discussing the Admin API exclusively.

Planning a Dashboard

In the introductory section I mentioned a pretty common scenario where a company either has, or wants to have, an executive dashboard that the CEO and others can leverage to get a snapshot of how the organization is performing. Maybe you are using some tools from Pentaho or TIBCO Spotfire, or even built something on your own utilizing tools like JSlate. I am going to show you how to use the Lotame Admin ReST API to add additional information to a corporate dashboard. In this case I will build a simple custom dashboard and add widgets that provide a list of:

  • the most recently created audiences
  • the 10 largest audiences by unique count over the last month
  • audiences that match some search criteria
  • month over month audience statistics, and
  • some financial information related to money earned from data selling compared to a financial target.

We could implement these examples in any language we choose – C#, Java, Ruby, PHP, Python, pretty much anything that can parse XML or JSON. While I am partial to Java, this time I am going to use Python to simplify the build and deploy process. I am also going to use JSON for retrieving data as it is generally much easier to use than XML. While there certainly isn’t any requirement to use object oriented code, I am going to go the “object” route here so that I can show you discreet bits of functionality and put together a good example that you can build upon. I’ll be using Python 2.7.x and try to stick to standard libraries. I will also reference some links from our API documentation.

I’m actually going to split this tutorial into two parts so that I can offer as much detail as possible without being overwhelmingly long. I’ll start off by showing you how to get a token and then make some API requests and then I’ll tie it together by adding some UI code that actually calls these methods. At the end of Part 1 you will know how to call the API and make a widget that displays the data, and potentially pull that widget into any existing dashboard you may already use. Part 2 will focus on building more widgets and creating the dashboard using the same technologies.

Now let’s get started!

Connecting to the API

The first thing that you need to do when using the Admin API is authenticate yourself. Not everyone has access to the ReST API, so make sure to call your Lotame representative and ask for access to the Admin API. A couple of things to keep in mind when you do that:

  • You are probably better off asking for a new account for accessing the API than using an existing one. If you use an existing user account, then every programmer that works on the code or configuration for accessing the API will know user’s password. Personally I would ask for an account like api-access@mycompany.com.
  • Having an account that can leverage the API doesn’t mean that you will have access to everything available in the API or mentioned in the documentation. Our API access works just like our UI access. For instance, if your company has access to create and manage audiences and view profile reports, but your contract doesn’t provide you with the ability to manage custom hierarchies, then that will be reflected in what you can do with the API as well.

Now that we have all of that out of the way, let’s create a connection class that we can reuse across all of our widgets. This class is responsible for reading the API username and password out of a configuration file, passing that to the API, and receiving a token that we will use for every successive call. The class will have methods for connecting, disconnecting, verifying the token is valid, and it will hold the token itself. There is a security section in our API Help that describes the token and connection process. (Please note that you will need an existing Crowd Control account to access the API Help). I’m going to place the Connection class in a python file called api.py. You can see where we open and read our configuration file on lines 14 – 18 to get the username (email), password, and base URL for the API. It is best to put these into a configuration file so that you can easily change any of these settings in the future (passwords never belong in code!). Then lines 22 – 24 set up our parameters and make the call to the API for our token. As you look at the code below you will notice that there are also two import statements at lines 3 and 4 that we haven’t talked about yet. We will come back to those.

api.py – Connection Class

from urllib import urlencode
from ConfigParser import SafeConfigParser
from urllib2 import urlopen, Request
import json

class Connection:

    token = ''
    base_url = ''

    def __init__(self):
        # This is where I will pull in the config file
        # Read out the client_id, email, password, and base url
        parser = SafeConfigParser()
        parser.read('config.cfg')
        email = parser.get('api_samples', 'email')
        password = parser.get('api_samples', 'password')
        self.base_url = parser.get('api_samples', 'base_url')

        # Now we can set up the email and password parameters we
        # need to send the the API to get a token
        params = urlencode({'email': email,
                            'password': password})  # urllib.urlencode
        self.token = urlopen(self.base_url, params).read()

The configuration file opened on line 15 above should have a section titled api_samples and the base_url options described below. Remember to enter your e-mail address / username and password lines 3 and 4.

config.cfg

[api_samples]
base_url = https://api.lotame.com
email = USERNAME_FOR_API_CALLS
password = PASSWORD

That’s it to get a token for authorization! Without the comments, the config reading code, and the class setup, it was just a couple of lines to get the token. An important thing to keep in mind about that token is that it has a 30 minute expiration period. However, the expiration period will be reset each time you use it so that it expires 30 minutes after the last use. That means that if some time passes between connections you might want to re-authenticate get a new token. If your token hasn’t expired yet and you make a re-authentication request, the API will return the same token information and reset the expiration time. Now that we have a connection, we can see how easy it is to make some service requests.

A Simple Service Request

We should now be able to make service requests using our new token. In this first example I simply plan to bring back a report of the top audiences for our client. The first thing that I want to do is check out the documentation for the Top Audiences Report. That page provides us the API endpoint and any parameters that we could pass to the API. I’ve included what it looks like as of the time of this article.

TopAudiencesReport

This snippet tells you that you can access the end point at https://api.lotame.com/audstats/reports/topAudiences and that by default it will return the top 100 audiences sorted by the count of uniques (cookies) from yesterday. You don’t need to pass any parameters, but if you want to change the number of records returned, the date range, or the sort order you can. Passing your client id (client_id) is not required, but you may want to use it if you have access to data for multiple organizations within Crowd Control. Now that we know what call to make, let’s write some code to make it.

I’m going to make a new class called ServiceRequests that I am actually going to place in api.py along with our Connection class as these two really do go hand in hand. Since all of our API requests in this article are going to be GET requests (we won’t be updating any records), the first thing we have in the ServiceRequests class is a method that knows how to make GET requests of the API, passing the token. The method also lets the API know we are interested in having JSON data returned, which is why we had an ‘import json‘ statement in the previous code block. By factoring this out we just won’t have to make these same calls over and over again for each of our service requests.

api.py – ServiceRequests Class

class ServiceRequests:
    def _make_get_request(self, rest_url, token):
        """Private method for making all of the actual ReST GET requests.
        The assumption is that all requests desire a JSON response."""

        # set the authorization with our token
        headers = {'Authorization': token}

        # We want JSON, if we don't set this we will get XML by default
        headers['Accept'] = 'application/json'

        json_to_return = json.load(urlopen(Request(rest_url,
                                    headers=headers)))

        return json_to_return

Easy enough, now we can use the _make_get_request to assist us in getting the top audiences. Staying in the ServiceRequests class, we can make our first real service call. The first time through this example I’m just going to show you how to make the service call using the default service settings. Later on we can pass some parameters. Let’s see what that looks like now.

api.py ServiceRequests Class – get_top_audiences

def get_top_audiences(self,
                          connection):

        # Here we create the URL for the ReST request
        rest_url = connection.base_url + '/audstats/reports/topAudiences'

        # We can print out our URL to make sure we got it right
        print 'rest_url = %s' % rest_url

        # Make the request
        return self._make_get_request(rest_url, connection.token)

Line 5 shows us building the URL that we need to make our request, and line 11 makes the actual request. I stuck a print statement in there just so you could see the URL during run time. We still need to build something to parse the JSON that was returned and make it look nice in the browser, but as far as interacting with the API, that is pretty much it.

Building a Simple UI

The last thing we need to do is find a way to display the data. Most of this has little to nothing to do with the API directly, other than parsing the JSON, but we still need a way to display the data and complete our initial example. The UI I am going to build leverages two extra components – flask and bootstap. Flask is a Python microframework that helps you build web applications. Bootstrap is from Twitter and is essentially a collection of CSS and HTML conventions / templates that allow you to (among other things) create consistent UIs. In our case, I use flask for most of the processing and bootstrap to style the UI.

app.py – top_audiences

from api import Connection, ServiceRequests
from flask import Flask, render_template
import os

# create the app
app = Flask(__name__)

# Create Connection and ServiceRequest objects
connection = Connection()
rest_request = ServiceRequests()

# The method and route for the top audiences
@app.route('/top')
def top_audiences():
    """This endpoint returns the top audiences by unique count for the client
    in question.
    For the ReST request we are using the defaults for sort attribute, sort
    order, the number of audiences to return, and the date range.
    """
    json_request = rest_request.get_top_audiences(connection=connection)

    # Get all of the audiences, their targeting codes, and the unique count
    # from the json
    audiences = []
    for aud in json_request['stat']:
        audiences.append({'AudienceName': aud['audienceName'],
                         'audienceTargetingCode': aud['audienceTargetingCode'],
                         'uniques': '{:,}'.format(int(aud['uniques']))})

    # Then we render the appropriate html page, passing the resulting info.
    return render_template('top_audiences.html', audiences=audiences)

# And then we add a bit of code for this to run on port 33911 (just the port
# I chose).
if __name__ == '__main__':

    port = int(os.environ.get('PORT', 33911))
    app.run(host='0.0.0.0', port=port)

On line 32 of the file above you will notice that we want to return our JSON data into a page called ‘top_audiences.html‘. That page is defined below. It in turn leverages an HTML file (layout.html) that pulls in CSS files from bootstrap. All of the class elements that we are leveraging, such as class=”brand” and class=”navbar” are defined in bootstrap.

top_audiences.html

{% extends "layout.html" %}
{% block body %}</pre>
<div class="span6 well  well-small">
<div class="navbar">
<div class="navbar-inner">
<div class="container"><a class="brand" href="#">Top Audiences</a></div>
</div>
</div>
Audience NameTargeting CodeSize {% for audience in audiences %} {% else %}
<ul>
	<li><em>No audiences exist.</em>
 {% endfor %}</li>
</ul>
<table class="table table-striped table-condensed">
<tbody>
<tr>
<td>{{ audience.AudienceName }}</td>
<td>{{ audience.audienceTargetingCode }}</td>
<td align="right">{{ audience.uniques }}</td>
</tr>
</tbody>
</table>
</div>
<pre>
{% endblock %}

I have included the layout HTML file here too for reference, so that you can see how I pulled in the bootstrap CSS. You will notice that you could also have defined your own CSS (line 3) and not used bootstrap at all.

layout.html

<!doctype html>
Top Audiences
<!-- 		<link rel=stylesheet type=text/css href="{{url_for('static', filename='style.css')}}"> -->
		<link href="{{url_for('static', filename='bootstrap/css/bootstrap-custom.css')}}" rel="stylesheet" type="text/css" />
    {% for message in get_flashed_messages() %}</pre>
<div class="flash">{{ message }}</div>
<pre>    {% endfor %}
    <!-- <body class="container"> -->

        {% block body %}

        {%endblock%}

The Results So Far and What Comes Next

Thus far this article has provided you with some of the basics. We created a way to call the authenticate, call the web service, and show those results in a UI. If we just stopped there, then our UI would look something like the screen shot below.

Simple Top Audiences

You could write something like this and then drop it into an IFrame on a dashboard, or pull it in through some other mechanism your dashboard tools support. This is repeatable for every end point available in the API, including those that allow you to manipulate (create, edit, or delete) data. What we haven’t done yet is look at an example that shows you how to manipulate the API call further, allowing you to change the date range, the number of audiences you return, or the sort order. The next couple of sections will demonstrate how you can tweak the code above to allow more flexibility. Eventually I’ll show you how I implemented a simple dashboard.

Tweaking the Service Call with Parameters

In the examples above we looked at how to leverage the service call, but we left it only making defaults. That meant that the call brought us back the top 100 audiences, which isn’t very likely to fit nicely into a dashboard. What I will show you next is how to use parameters with the service call to bring back only the top 10 audiences. First, refer back to the API snippet we included above. It tells us the top audiences endpoint accepts a parameter called page_count, which is set to a default of 100. All we need to do is set that parameter to 10 in the service to get back what we want. That would make the resulting web service call look like:

https://api.lotame.com/audstats/reports/topAudiences?page_count=10

In our code for the ServiceRequests class, we just need to alter the get_top_audiences method to set that parameter before making the request.  The code sample below shows how I modified get_top_audiences to allow for a more dynamic request from the UI components, allowing UI components to make dynamic requests specifying the sort order, the audience attribute used for sorting, the client id, the number of audiences to return, and the time period to use for determining the size of the audiences.  The method itself sets some defaults, including 10 for the audience count, that are then utilized in the service call.  When we look at the entirety of api.py so for (this includes both the Connection class and ServiceRequests class) we end up with:

apy.py – Dynamic Requests

from urllib import urlencode
from urllib2 import urlopen, Request
import json
from ConfigParser import SafeConfigParser

class Connection:
    token = ''
    base_url = ''
    def __init__(self):
        # This is where I will pull in the config file
        # Read out the client_id, email, password, and base url
        parser = SafeConfigParser()
        parser.read('config.cfg')
        email = parser.get('api_samples', 'email')
        password = parser.get('api_samples', 'password')
        self.base_url = parser.get('api_samples', 'base_url')
        # Now we can set up the email and password parameters we
        # need to send the the API to get a token
        params = urlencode({'email': email,
                            'password': password})  # urllib.urlencode
        self.token = urlopen(self.base_url, params).read()

class ServiceRequests:
    def _make_get_request(self, rest_url, token):
        """Private method for making all of the actual ReST GET requests.
        The assumption is that all requests desire a JSON response."""

        # set the authorization with our token
        headers = {'Authorization': token}

        # We want JSON, if we don't set this we will get XML by default
        headers['Accept'] = 'application/json'

        json_to_return = json.load(urlopen(Request(rest_url,
                                                       headers=headers)))
        return json_to_return

    def get_top_audiences(self,
                          connection,
                          sort_order='DESC',
                          sort_attr='uniques',
                          client_id=0,
                          audience_count=10,
                          date_range='LAST_30_DAYS'):
        """Returns the list of top audiences.

        Keyword arguments:

        connection - the api connection object, needed to gain access to the
        api token

        sort_attr - the audience attribute that you want to use as the sort key
        The default is uniques

        sort_order - the order for the sort (ASC, DESC).  Default is DESC

        client_id - the client id for the client that owns the audience. Client
        id can be optional if you only have access to a single client or if you
        want to return all audiences to which you have access.  The default is
        empty.

        audience_count - the number of audiences you want to return.  The
        default from the service is 100, the default for this code is 10.)

        date_rage - the date interval that you want to use for any statistics.
        The default is LAST_30_DAYS, but the service also supports YESTERDAY,
        LAST_3_DAYS, LAST_7_DAYS, MONTH_TO_DATE, and LAST_MONTH.
        """

        # Build up our parameters list for the call
        params = ['sort_order=%s' % sort_order,
                    'sort_attr=%s' % sort_attr,
                    'page_count=%s' % audience_count,
                    'date_range=%s' % date_range]        

        # Add the client ID, if we have it
        if client_id:
            params.append('client_id=%s' % client_id)

        # Here we create the URL for our ReST request with our params
        rest_url = connection.base_url + '/audstats/reports/topAudiences?' + \
                    '&'.join(params)

        # We can print out our URL to make sure we got it right
        print 'rest_url = %s' % rest_url

        # Make the request
        return self._make_get_request(rest_url, connection.token)

This method might seem like it just got a lot more complicated, but it really didn’t. Most of the extra code is either comments (lines 47 – 70) or is just there to build up the URL for the API request. The biggest thing to note is that the method signature now allows you to pass in all of the parameters or use the defaults to build out the service call (split over lines 83 and 84). Nothing at all needs to change in our app.py code, although we could alter it to change one of the defaults in api.py by making the JSON request look like this:

 

json_request = rest_request.get_top_audiences(connection=connection,
                                               client_id=1324,
                                               sort_order='DESC')

Then we just have one little tiny change in the top_audiences.html to reflect the fact that we are only returning the top ten audiences. Line 11 is all that is new.

 

{% extends "layout.html" %}
{% block body %}</pre>
<div class="span6 well  well-small">
<div class="navbar">
<div class="navbar-inner">
<div class="container"><a class="brand" href="#">Top Audiences</a></div>
 <em>(Top 10 Audiences by Unique Count)</em></div>
</div>
Audience NameTargeting CodeSize {% for audience in audiences %} {% else %}
<ul>
	<li><em>No audiences exist.</em>
 {% endfor %}</li>
</ul>
<table class="table table-striped table-condensed">
<tbody>
<tr>
<td>{{ audience.AudienceName }}</td>
<td>{{ audience.audienceTargetingCode }}</td>
<td align="right">{{ audience.uniques }}</td>
</tr>
</tbody>
</table>
</div>
<pre>
{% endblock %}

Following all of the instructions and code samples above will result in a widget that looks like the live example we have hosted at the Lotame API Samples site in Heroku. The sample one should certainly have different data than yours, as you will be using data from your own client.

Next Steps

In the next tutorial, I will show you how to make several other API calls to return additional information and then we will tie it all together by creating a simple example dashboard. The dashboard will contain several pieces of information, including some example Key Performance Indicators (KPIs) that could be used to better understand how your business is tracking against goals. I look forward writing that tutorial and having you come back and read that one as well.

About these ads

Data – The Foundation of the Internet Food Pyramid

Learning From the Past

I started working 20 years ago, and while times were very different (Internet access was pretty limited, with only 3 million users worldwide, and nobody knew what software as a service would really mean) some things haven’t changed that much.

My first job was with an engineering company, and one of the things that we did there was to help electric providers access and manage their data.  All of these organizations were responsible in some way for generating piles of data.  Not only the information about how their facilities were licensed and built, but notes, documents, specifications and every bit of information you could imagine related to maintaining, operating, and modifying the facility.

It was great, these companies had tons of data about everything they had ever done – they had the power of data (pun intended).  Well, in theory anyway. The problem for most of these companies was twofold.  First, most of them did not have this direct access to this data.  Even though it was about them, many times it wasn’t considered theirs.  The engineering and maintenance firms either considered it their own or convinced the power providers that they (the engineering companies) were in the best position to hold onto this data, usually in systems to which only the engineering firms had access.

At best the electric provider would have a room full of paper documents that reflected the work that had been done.   This made getting access to that information a time consuming, and sometimes frustrating and costly exercise.  This led to the second problem – this data was primarily at rest.  In other words, since accessing the data was time consuming and costly, no one had any incentive to use it to do anything until they absolutely needed to, so the data just sat there most of the time.

Some of our more forward-thinking customers leveraged our services to address these issues, building systems where they could capture; store, retrieve, and more effectively use all of this data.  That gave them real power.  They were in a better position not only to respond to both internal and external requests, they were also in a much better position to pro-actively improve their facilities and processes in a way that they had not previously had the stomach to attack due to the extra time and costs.

Apply Those Lessons Now

Everything happens in cycles, and looking at businesses on the Internet today we see some of the same problems.  Sure, we have widespread connectivity, new tools, high-powered servers and software capable of creating and churning through terabytes of data, but what is anybody doing with it?  Many companies out there find they are not in that different of a position than those electric providers from 20 years ago, collecting piles of data in any number of systems or logs.  But once it is there, what is anyone doing with it?  What are you doing with it?

Take Charge of Your Data

At some point in the lifespan of any company you need to be asking yourself about your data strategy.  You need to think about all of the opportunities that data can afford you, from providing a better service, to attracting more customers, and ultimately improving the bottom line of your company.  Chances are that if you are not answering these questions one of your competitors probably is giving it some consideration, and while I for one do not put much stock in the “competitor x is doing y, so I need to as well” line of thinking, ignoring your data strategy is doing a disservice to both you and your customers.

Today there are any number of tools and vendors out there that will tell you that they can help you with these kinds of data management and usage problems.  One tool that you can add to your arsenal for executing against a data strategy is a Data Management Platform (DMP).  “Great…” you are probably thinking, another buzzword and another vendor to worry about. “What is a DMP and how can one help me anyway?”

Pick the Right Tools and Partners

First, I would suggest that you do not view whomever you choose to work with as a vendor, but rather that you view them (and they view you) as a partner.  Now that we got that out of the way, I am going to ignore any number of the definitions that you might be able to find out there from vendor or industry sites, and I’m going to tell you what I think it is and what it can do to help you.

I view a DMP as a platform that helps you:

  • Determine how your customers or users interact with you through any number of contact points, including a desktop browser, a mobile device, or even other applications you may use in your enterprise;
  • Combine this data with data you may have from other platforms such as a CRM or POS or third party data;
  • Allow you to view that data in a way that makes sense to you;
  • Analyze and report on that data so that you can visualize the relationships between the various ways that your users interact with you;
  • Use all of that data information to create a better user experience for that user, whether that be showing them more targeted content, serving them a more relevant ad, or even provide them a better purchasing or call center experience;
  • Use the data any way you see fit – it is your data after all.

You may choose to use any number of those types of features in combination or note use any of them at all, but a DMP should ultimately provide those options so that it can support your organization as it grows and evolves.  Elaborating a bit from the points above, a DMP shouldn’t be a closed system that collects your data and just sucks it into a black box where you don’t get many, if any choice, of exactly how to view it or use it.  Instead a DMP should be an open platform that allows you to combine data from multiple sources, and allows you to use that data any way you want through APIs and integrations.  You don’t want to be one of those electric providers mentioned above that always had to ask the engineering firm permission to get their own data.  Instead you want to be that company that is in charge of your own data and strategy.   You should be in charge, and you should have options to control the data that is collected, the classification or categorization and lifespan of that data, and the use of the data.  A system that provides defaults for you is great.  But a system that has defaults and provides you the option of customizing them is even better.  One that offers all of that and is easy to use and offers some level of transparency into how the system works is best.

Remember Privacy

And let’s not forget the ePrivacy Directive in Europe, which makes ownership of all data emanating from a given site the responsibility of that site’s owners.  If you do not control your data, but cede even a small amount of control to other companies who pixel your site, the ePrivacy directive is clear – this will become your liability.  It is vital to understand all of the tracking on your own site.  Work with the partner you choose to set up a system to regularly monitor and audit all the code on your sites.  This is more than just a “cookie audit.”  Much of the tracking covered by the ePrivacy Directive doesn’t use cookies at all.  You need to know the actual scripts that run on your pages. If you haven’t obtained a full tracking audit recently, be sure this is your first step.  You’ll be surprised by the results.  Once complete, you’ll need to categorize each tracker as essential or non-essential, and then rank them on a scale of relative intrusiveness.  Bake this into your data strategy at the onset of your planning.

Put Your Data to Work

Once you have a data strategy and start working with a DMP, then you can put that data to work for you, moving from a state of rest to a state of competitive advantage.  Build a strategy, pick a DMP partner, and wow all of your customers through providing them great content, meaningful targeted ads that they are more likely to find interesting, or even person-to-person customer support.

Choosing a Portal Platform

Why Look at Portals?

Along with our recent effort to build a collection of common/shared Identity and Security Services for our products leveraging an LDAP / Directory server (see posts on LDAP comparison) and a security framework (more on this in another post), we are also working to carry the service paradigm forward into the user interface.   I don’t feel like portals really get as much attention as they deserve in the service oriented architecture (SOA) world, while other things that aren’t really service oriented at all end up being associated with the SOA buzz (see my thoughts on Service Oriented Architecture).  Having said that, it seems to me that portals can provide an especially good representation of services and allow for aggregate applications in a way that most other presentation options don’t.  You can really gain quite a bit of flexibility by creating portlets that leverage either one or many services.

The rest of this post will talk about some of our high level requirements for choosing a portal and what we evaluated.  I’ll tell you what made the short list, what didn’t and why, and then followup with another post that discusses our short list candidates and how we picked a winner.

The Requirements

Our overall goal is to be able to use the portal to create a flexible product that we then deliver over the internet.  We don’t sell the product itself – we only sell the services that our products deliver, and then the ability to get to our data using our flexible front end (soon to be available in a portal).  Some of our requirements should be things that all people consider, while others, such as the fit to our particular architecture and technology stack, will be more specific to us.  Keeping that in mind, here were some of our high level requirements:

  • Standards are important, so we want JSR-168 / JSR 286 and WSRP support.  WSRP support, in my opinion, is better even than the JSR support, because it allows for a better dispersal of portlets and services.  Each  JSR portlet needs to reside on the portal server, whereas only one WSRP implementation need be resident on the portal – all the WSRP portlet “guts” can be anywhere else.  Having said that, the WSRP and WSRP2 portlet implementation are normally JSR compliant porlets themselves.
  • Support for LDAP directly or through a security framework, with as little duplication of storage as possible.  Plus, the LDAP server should be the source of as much of the identity data as possible.
  • Customization and configurability are key.  The end user need not even know they are really using a portal … at the very least, the portal should be branded and have a look and feel that says it is a Lotame product and not look like someone else’s.  We may also need to change the layouts, the login, and remove components that we don’t need.
  • Ease of use.  This falls into two categories – end users and developers.
    • End users need to be able to use whatever we create.  While much of that falls to us, the portal framework also plays a significant role.
    • Developers should be able to figure out how to use the portal, how to configure it, how to customize, add new pages, content and portlets without having to spend days and days just to get started.  While it may take a while to become a portal “expert”, you should be able to get up and running in a couple of hours.
  • It should fit nicely into our technology stack.  We run a Linux and MySQL environment here.  Anything that doesn’t run on those is out!  The bulk of our development is done in Java and PHP and we prefer not to add any new languages if it can be avoided.  You can begin to see a trend here, we also prefer open source products so we will be sticking with products that are either open source or otherwise very affordable.
  • It should also fit into architectural roadmap.  At a high level we are choosing a portal because conceptually it does fit with our vision of creating aggregate applications through use and reuse of services, but the specific portal we choose needs to be a good long term choice based on where we are taking our entire product line.
  • Some document management or CMS ability would also be nice to have.
  • Good documentation and community support.

The Contenders

We investigated or otherwise considered the products below.  For those of you who read this and think … “but there is a more recent version out now”, I recognize that, but this evaluation was done essentially in November of 2008.  There are also many other portals and portal frameworks out there, including ones for pay such as Vignette, BEA, IBM, Oracle and others.  The portals below were chosen because 1) they met our requirements above and 2) we were aware of them.

Product Version Link to Site
Jahia Community Edition 5 http://www.jahia.com/jahia/Jahia
Jetspeed 2 http://portals.apache.org/jetspeed-2/
JBoss Portal (Community Edition) 4.7 http://www.jboss.org/jbossportal/
eXo Portal Community Edition 2.2 http://www.exoplatform.com/portal/public/en/
Light Portal ? Pre-1.0 https://light.dev.java.net/
Gridsphere 3.1 http://www.gridsphere.org/gridsphere/gridsphere
Liferay Portal Standard Edition 5.1.2 http://www.liferay.com

The First Pass

Coming into the evaluation I was the most excited about 3 portals – Jetspeed, eXo, and Liferay.  This was primarily due to some previous experiences as well as just a general respect for each of these communities.  I really didn’t know enough about the others to render much of an opinion one way or another with the possible exception of JBoss since we run the JBoss App Server and I knew that there was (at some point at least) some involvement work done by Novell on the JBoss Portal.  In any case, I went and downloaded each of the portals and gave them a try.  In many cases I also encouraged some of our developers to do the same so I could also learn from their experiences.

Jahia Community Edition

Jahia is a very nice product that meets most of the requirements that we have, incuding very good support for content management, JSR-168 portlets, good administrative tools, and good support for developers including support for Maven and Ant.  Jahia has a very strong offering, with only one problem for us … no support for LDAP in the Community version.  Seeing as how LDAP integration was one of our stated high level requirements, this was a problem.  Also, since we did not find any infromation readily available about integrating a security framework such as Acegi / Spring Security with Jahia (which would allow for LDAP integration via one level of indirection), this pretty much sealed Jahia’s fate.

Regardless of our decision, Jahia is a very good product.  You can find more information about its features here.

Light Portal

Per the website:

Light is an Ajax and Java based Open Source Portal framework which can be seamless plugged in to any Java Web Application or as an independent Portal server. One of its unique features is that it can be turned on when users need to access their personalized portal and turned off when users want to do regular business processes.

This is in fact true, it is a framework that shows some promise, but it was in fact a little too light for us.  In other words, it left us with a lot to do and it didn’t have any CMS capabilities, LDAP integration, etc.  It would seem that a lot of new work is going on and I just read a message here at the end of December that said 1.0 is on its way.  Not right for us, but I am interested in what happens with Light and I’ll keep watching.

Gridsphere

I ended up not spending too much time looking at Gridsphere.  It seems pretty straight forward to set up and use, but the feature base didn’t have everything I wanted (again LDAP, CMS).  There is some documentation out there that seems to be accurate, but honestly, there isn’t that much of it.  Gridsphere also was not right for us.

eXo Portal Community Edition

I was pretty excited about eXo.  When I was doing my first portal implementation in 2003 or so eXo wasn’t quite what we needed, but it was really coming along and I went into this evaluation looking forward to using it.  I downloaded both the All in One bundle as well as the 2.2 Portal Tomcat binary (both from here).  getting things up and running with these packages was very straight forward, and eXo looked really great right out of the box.  We even took a look at the WebOS /  desktop portion of the product in the All in One package, and it looks really cool (no something we need right now, but it works).  From a feature standpoint eXo looks really strong, particularly in their ability to deal in a flexible manner with LDAP models.  If you don’t have a directory tree already set up, then it will create a tree that supports what it needs to do.  If you have a tree already, eXo can be configured to work with it (see the LDAP config info).  eXo also has great standards support, including JSR-168 / JSR-286, WSRP 2, JSR-170 (for content), and supports the JAAS security framework.  On top of that it support AJAX, a REST API, drag and drop, customization, and bunches of flexibility.

eXo looked great and looked like it would make our short list for sure … until we tried to really use it.  I was continuously stumped by the interface when I was trying to create new pages, new portlet instances, and basically place our applications into the portal.  The interface was not working in any way that I expected and I could not seem to create a new instance of a porlet.  Assuming it was just me being thick, I had several other developers install it and take an independent look before I made any significant comments.  Each one had the same issue – they thought it was easy to install and looked great, but couldn’t figure out how to create new portlet instances.  I was sure that we were missing something in the setup, or missing something very simple right in front of our faces.  The documentation we were looking at was not able to help us along any.

Unfortunately, that led to eXo not making our short list.  I know there are bunches of people out there using eXo, and I am sure it is a fantastic platform and I can see all the work that has gone into it.  The problem is that our experience violated one of our high level requirements – that it be easy for developers to get up and get started without being a portal expert.  I am willing to chalk our experience up to some lack of understanding or missing of the obvious on our part, and certainly look forward to any info that we might get from eXo or eXo users.  While ultimately our choice has already been made, I want to understand what went wrong with our look at eXo and would be willing to consider it in the future.

Jetspeed 2

My personal experiences with portal have been very heavy in Jetspeed (and the implementation from Gluecode, which is now part of IBM – the Gluecode portal product no longer exists to the best of my knowledge) and Vignette, so I was also excited about Jetspeed.  If you want standards compliant software that is working to make sure to support integration with other Apache based software, then Jetspeed is it.  Jetspeed has support for most of the standards and supports LDAP quite nicely (they work well with ApacheDS).  I did have some questions about how flexible the support for LDAP is if I have a schema that differs significantly from the default LDAP schema.  It looks like there were custom objectclasses and attributes, which is something we were looking to stay away from.

In the end, Jetspeed 2 did not make our short list for a couple of reasons.

  • I found the existing workflow for creating portlets and pages to be a little more cumbersome than I had expected.  I worked extensively with Jetspeed 1, and a little with Jetspeed 2 when work on it was just starting.  I found the placement and workflow of the administrative functions to be cumbersome and not intuitive to any of the developers that looked at it.  I knew what to do based on previous experiences, but the other folks struggled a bit.  I realize we could customize it to make it better, but I want to focus on working on the look and feel and our own apps, which is what our users are going to see.  I don’t want to spend a ton of time working on the administrative interfaces above some limited identity type of work we will have to do in any portal.
  • The documentation is OK, but not great.

I love Apache, and have used many Apache products, but two areas the I personally feel many Apache products come up short is in presentation / UI (the first point above) and in documentation (second point).  This isn’t true for all Apache products, and it has gotten somewhat better, but documentation and UI are just not major strength of some of these folks.  That is one reason why I liked when Gluecode implemented a commercial version of Jetspeed – they cleaned it up a bit.  I also had some concerns about the flexibility of the LDAP model, but I didn’t dig deep enough to be able to address those concerns.

Apache is great, Jetspeed is great, but Jetspeed 2 did not make our short list based on our high level criteria.

The Short List

So, what did make our short list and why?  Well, the short answer is that Liferay and the JBoss Portal (Community Edition) were our short list candidates.  Liferay had always been a contender, but JBoss Portal made the short list toward the very end of the evaluation.   Both of these products have good documentation, good support for standards, support for LDAP and security frameworks, and strong community support.

I will be providing some more detailed information on why the made the short list and what portal we eventually chose, but this post has gotten quite long, so I will be providing that information in a followup post in the next couple of days (probably after the new year)

Comparison of LDAP / Directory Servers – Update

Almost two months ago I wrote a post about some directory servers I was testing, mostly I wrote about some early testing that I had done with OpenDS and OpenLDAP.  Those test results showed OpenDS performing better than OpenLDAP in an out of the box testing scenario.  I got some feedback from different folks, including Howard Chu who has been involved with OpenLDAP.  While I didn’t follow up directly with Howard on his tuning comments, I did do some tuning of both OpenLDAP and OpenDS.   I don’t have all of the test results in a presentable format, but I do have some additional findings.

Improving Performance

Both of these directory servers come tuned for developer use out of the box, which is to say that they are not really tuned in any way at all.  Instead they are configured to use as small a footprint as possible.  This makes a lot of sense, since the developers have no idea how much memory or process power you have and make an assumption that the first time you use it you are trying it out in an development or test environment.

Once I spent some more time on the OpenDS and OpenLDAP sites and tweaking the configuration of each, I was able to show improved performance in each.  Given the nature of our implementation, only a couple of hundred records right now and a fairly low number of requests, the performance difference between the two was negligible.   It is possible that we might see some more significant difference with a larger number of requests and more entries.

You can find more tuning information for OpenLDAP at: http://www.openldap.org/faq/data/cache/190.html

More tuning information for OpenDS is here:  https://www.opends.org/wiki/page/HowToTunePerformance

The Verdict – Take 2

Given the results were so close, did that alter my preference for OpenDS?  Nope.  We have been very happy with the test results and features from OpenDS.   OpenDS also fits very well into our architecture and technology stack.  Personally I am very comfortable with the tools and documentation for OpenDS, and the OpenDS team continues to improve both.

Final Thoughts

OpenDS works very well for us and matches what were were looking for very well, both from a technology standpoint and a community standpoint.  The OpenDS developers and community members are all very friendly and helpful.  They continue to make improvements in the software and documentation.

Having said that, there may be reasons why you would choose one of the other directory servers, so while you may use my experience as a guide, make sure that you compare the features, technology stack, and architecture to your own requirements.

I would recommend evaluating not only OpenDS, but also OpenLDAP, ApacheDS, and others such as Red Hat / Fedora Directory Server.  If you are in a Windows shop, any of the LDAP servers will work for you, but certainly Active Directory should be considered.  I also have a high level of respect for Novell’s eDirectory.  If you have a very large deployment, the eDirectory might be something you really want to consider.  Keep in mind that both Active Directory and eDirectory are both LDAP-compliant servers that offer features beyond an LDAP server, and may in fact differ from the LDAP specification in some areas.

Lotame is Looking for a MySQL DBA / Data Architect

We are currently looking for a MySQL DBA / Data Architect at Lotame.  The job description is below and can also be found on our web site at on the careers page, where you can also apply and upload your resume.

MySQL DBA/Data Architect

Lotame is looking for a data mastermind to build the next generation of internet marketing technology specifically geared towards social networking websites like MySpace and Facebook. If you like to be intimately involved in the design and development of a company’s core technology, want to own projects and make a real difference in a small company, then Lotame wants you.

Candidates for this position must be thoroughly knowledgeable in database technologies, with specific knowledge of MySQL in a Linux environment.

The day to day:

  • Manage MySQL on Linux in Production/QA/Development environments utilizing open source technologies including LVM and DRDB.
  • Support system administrators and developers to ensure optimal system design
  • Ensure database systems perform reliably
  • Lead the design of all aspects of the data architecture, including data models, data flows, aggregations, clustering, replication, and back and recovery strategies
  • Develop and enforce enterprise wide database design standards
  • Support team in troubleshooting efforts

Qualifications:

  • BS in a technical field or significant relevant experience
  • 4+ years of database experience
  • Proficient in SQL
  • Experience in database performance tuning, backup and recovery, system architecture and design with MySQL
  • Strong understanding of system administration fundamentals in a Linux environment.
  • Ability to write scripts to support database administration activities.
  • Experience working with very large data sets
  • Experience with data sharding/partitioning is a major plus
  • Experience with MySQL stored procedures, functions, and triggers
  • Experience managing large, dynamic, storage using LVM.
  • Knowledge of multiple MySQL storage engines
  • Self-motivated with strong problem solving and analytic skills
  • Ability to handle multiple tasks

Comparison of Directory / LDAP Servers

A couple of weeks ago I began the exercise of investigation Directory Servers again.  This post provides some background on my previous experience, plus what I have found in my current investigation thus far.

The Past

I have some experience with Directory Servers from around 2001-2004, when I went through a very similar effort compared to what I am doing now, including a code evaluation of those that were open source.  At that time it really came down to whether or not it was appropriate for the company to use either LDAPd (now Apache DS) or OpenLDAP.  It came down to these two based primarily on the fact that they were more open or easier to install than some of the other contenders.  We looked at Sun Directory server, MS Active Directory, and Novell eDirectory, among others.  At the time the company company I worked for delivered a Windows based solution, but Active Directory just did not meet our needs, and the Sun Directory Server proved troublesome to install properly in our environment.  The Novell offering, while strong, relatively easy to install, and affordable (free for the first 1MM nodes at the time – we weren’t going to have that many) wasn’t the politically acceptable option and didn’t quite match up to our particular needs.

That left us with OpenLDAP and LDAPd, in part because they both supported database backends (we had a particular reason for requiring that).  Personally, I really liked LDAPd and offered a few code snippets to the project where I found some issues around the database backend modules.  I very much liked the idea of a true directory server that supported many different types of backends based on need, which I believe is what led Apache DS to be the encompassing project it is now.  The issue at that time was really more about maturity – was this project going to be able to support us moving forward?  On the other hand, OpenLDAP had history, a large install base, and unfortunately, seemed to have developers that honestly couldn’t have cared less about supporting the Windows environment.  On checking out the code, I found that it was impossible to build within Windows as it was.  That offered an experience to work in sussing through C conditional compile statements that I did not much relish, but never-the-less took on.  After some time fooling around with the code I did get it to compile, and was able to get the backend components to work with Oracle and MS SQL server as well.

While not thrilled about my Windows based experience with OpenLDAP, it was the winner.  It had the features we needed and it was solid and trusted.

The Situation Now

Now several years later I am back undertaking a similar effort at a different company and with a different set of requirements.  I am no longer constrained by the Windows environment as we run our entire environment in Linux.  I also don’t have the need (as far as I can tell right now) to support a database backend.  So, here we are again – OpenLDAP is still the same stalwart it has always been, Active Directory still doesn’t meet our needs (for different reasons), and I didn’t even bring up Novell this time (although I am still a fan).  The playing field has changed though – LDAPd is now ApacheDS, with a larger supporting cast, an increased scope, and a changed architecture.  The new kid on the block is OpenDS, which has come out of the Sun world, has had a 1.0 release this year, as well as some more recent 1.1 builds.  Given our platform and our requirements we decided to take a look at ApacheDS, OpenLDAP and OpenDS.  After some quick conversations with other folks out there, I decided to defer my test of ApacheDS and wait for some additional work to be done on the 1.5 branch to get some additional performance and “ready for prime-timeness” (I am happy to hear from the Apache DS folks if they believe I should reconsider).  So at the end of the day, I am off to install OpenDS and OpenLDAP ….

The Install

My test environment is CentOS, and I created two virtual servers on the same physical box with the exact same CPU and memory configurations.  The intent was to work with a more or less default install of each of the servers without performing any tuning, tweaking, or custom building.  CentOS always tends to be a few package versions back in their repositories, but I decided to use what was there for the sake of sticking with the rules above.  On the OpenDS side, I started with OpenDS 1.0, instead of the new builds in OpenDS 1.1.

From an install standpoint, OpenDS seemed quite a bit easier to me, even though I had never seen it before.  OpenLDAP was just exactly how I remembered it.  I did take the installs a bit further and went to set them up to use TLS for security.  Again, OpenDS was much easier to set this up with, I needed to do a lot more configuring with OpenLDAP and ended up deferring until later (we didn’t really need it for the test, or for our environmental set up in production either … for now.)

In terms of features, well again, LDAP is mostly LDAP.  However, one feature that I very much liked (and we need) from OpenDS was virtual attributes, specifically isMemberOf.  This is EXTREMELY useful to us, and the default install of OpenLDAP from the CentOS repos doesn’t have it.  To be fair, if I were to build the latest versions of OpenLDAP that are available, they do have the memberOf feature.  Based on the fact that we do not want to get into maintaining and building 3rd party code whenever possible, this ended up being a ‘+’ in the OpenDS column and a ‘-‘ in the OpenLDAP column.

The verdict, I installed OpenDS using both the GUI installer as well as the command line interface, and used command line interface for OpenLDAP.  All things considered, setting up one LDAP server feels much like setting up any other, it isn’t that hard.  In the end though, I just felt OpenDS was easier, and would certainly be more straightforward for someone without much backround in the technology .. and it had the nifty isMemberOf attribute we wanted.

Testing by Hand

First I just wanted to make sure that the servers did what they should do.  I should be able to use any LDAP client to add, remove, search, and modify LDAP entries.  Everything worked fine using both the command line tools, and mostly OK using available GUI LDAP clients.  My client tool of choice is the Apache Directory Manager client.  For the purposes of this initial set of tests we decided to go the simple route – we won’t have thousands of objects in the LDAP store, maybe one day, but now we are talking about hundreds.  I only created a handful of users, groups, and orgs for this test.    The only caveat I had in my testing with this client was that I had some very specific problems connecting to the OpenDS server using a startTLS connection from a Linux client.  I did not have the same issue with the Windows client, and have had some Linux clients that I have been able to “clear up”, but I can’t explain why yet (see incident DIRSTUDIO-414 at the ApacheDS project).  I did not have this same problem connecting to an ApacheDS server (I did set one up when I ran into this problem – very easy!), and I still haven’t set up a secure OpenLDAP server yet, so I can’t provide any feedback on that.

My next step was to write some sample code to exercise OpenLDAP and OpenDS a little.  Again, no problems with either, including a secure connections using TLS to the OpenDS server (which makes the problem above even more perplexing).  I put in a little data, updated it etc.

The last thing I did was to do some data exports / imports from OpenDS to OpenDS, OpenLDAP to OpenLDAP, OpenLDAP to OpenDS, and OpenDS to OpenLDAP.  The only thing that was unexpected here was that OpenDS was placing UUID information into the LDIF, which OpenLDAP was NOT HAPPY with (as it shouldn’t be).  Removing the UUID from the LDIF took care of the issue.  After that, everything was great.

Automated Testing

My initial instinct was to go for my good old trusted friend JMeter.  I did run a couple of tests through JMeter before I decided that it was a little more work than I wanted to take on, so I went looking for another tool.  That is when I ran across SLAMD.  I have been very happy with SLAMD thus far, even though I know I have only scratched the surface. Keeping in mind that I don’t have a ton of data in the stores, I headed off to run on of the LDAP Load Generator projects.  I created two – one for OpenDS and one for OpenLDAP.  These tests ran continuously over an 8 hour period with 2 client machines running 40 threads apiece, with the first 10 minutes and last 10 minutes of the 8 hours was used as a warm up and cool down time. Some graphs from those tests are included below.

SLAMd Comparis on Overall Operations Attempted

SLAMd Comparison of Overall Operations Attempted

SLAMd Comparison of Overall Operation Time

SLAMd Comparison of Overall Operation Time

The Verdict

While I have not spent a ton of time doing any analysis on these results, and certainly have not doen any tweaking to affect them in any way, it would appear that OpenDS has better performance than OpenLDAP in our simple tests.  With a few exceptions, the overall variability of performance with OpenLDAP seemed to vary a little more widely than the OpenDS performance, which suprised me a little.  There were some very higly variable points with the OpenDS server, but they are very high peaks spread out over what looks like very consistant intervals.  I would tend to interpret these as garbage collection since OpenDS is java based.  As I said, I have only begun to really investigate these results, so I welcome any other interpretations or experiences that other have and will continue to do some further investigation, run additional tests, and test out synchronization.

Why Social DevCamp East is Important

And Why You Should Attend

Social DevCamp East is being held in Baltimore again – the fall session was recently announced for November 1st.  

What is Social DevCamp?  

“Social DevCamp East is the Unconference for Thought Leaders of the Future Social Web ….. Social DevCamp East Fall 2008 once again invites east coast developers and technology business leaders to come together for a thoughtful discussion of the ideas and technologies that will drive the future of the social web.” – Social DevCamp Site

So, why should you care?  There a several reasons why this is an important event not only for developers and companies that are involved in social applications, but also for anyone interested in leading edge technologies and promoting the area ….  

Social DevCamp Provides a Great Opportunity to Learn About Technology

Many people still think of social networking as a niche area, but that simply isn’t true.  If you have been watching the technology news lately or read Andy Monfried’s blog you may have heard or read that 40% of internet traffic is on social networks. 40%!  That’s impressive, but what does it mean?  It means that in order to support that type of growth, to handle that much traffic and that many eyeballs, the social media related companies need to continue to push the envelope, advancing technology in ways that we haven’t done before.  They need to determine how to store more data, access it quicker, relate it better, and make it user friendly and useful enough for people to keep coming back.  The amount of data that many of these companies have to deal with is simply staggering.  Social media related companies are innovators, learning how to push existing technology to the limits and develop new technologies and strategies where needed.

The point is, that whether you are a social media developer or not, there are important things that you can hear about at Social DevCamp that you can take back to your company and use there.  Topics will include things like Web 3.0, Cloud Computing, Crowdsourcing, Building Out the Semantic Web, Mobile Development Best Practices, and other topics similar in nature.

Social DevCamp Provides a Great Opportunity to Network

This event is targeted to the East Coast, not just Baltimore.  That provides an opportunity to meet innovators from up and down the coast.  The Greater Baltimore Technical Council will be there as will people that have been involved with twittervision, the Mozilla Foundation, and others.  These are real people, with real ideas building and leveraging the latest technologies that will be sharing ideas in a setting conducive, and designed, for discussing ideas.  This is not a place where people are selling technology to you at a vendor show, but rather are talking about technology, where, and how it is being used.  What a great way to spend at least part of your day.

Social DevCamp Highlights the Great Opportunities and Companies on the East Coast

When people think of new technology, start-up companies, and people living on the bleeding edge of technology, they tend to think of the West Coast.  There are fantastic things going on out there, and there is quite a bit of innovation … but that doesn’t mean that there isn’t anything happening here in the East.  It is a mindset kind of thing  – looking for new technology, go west young man.

That isn’t true though, there are great things happening on both coasts.  The problem is, that not everyone knows about the great opportunities and the great small / start-up companies that are here.

Social DevCamp East provides the opportunity to bring an awareness of some of the really mind-bending and fun technology that people use in this area.  People that are looking for exciting opportunities on the East Coast can look right here and see who may be using technologies in the areas of cloud computing, distributed caching, distributed processing, data portability, and areas such as mobile computing and crowdsourcing.   

As you can tell, I am really excited about this event.  This is something that until recently I was unaware of until it was brought to my attention by someone with whom I work (thanks Bev!).  In fact, I am excited enough about it that I have just sent an e-mail to the organizers stating that Lotame would like to be one of the sponsors.  We want the same chance to see and hear the great things going on in this area and to mingle with smart people with great ideas.

Follow

Get every new post delivered to your Inbox.

Join 308 other followers

%d bloggers like this: