In my last blog entry I used a combination of SSA name registration data and life expectancy tables to calculate the probability that a person we might meet at random will be male or female, and the probability of their age being within a certain range. This week Iím going to show a practical application for this.
As a recap, below are six examples for the names: James, Terry, Barbara, Jennifer, Sandra and Robert.
Just about every retailer allows payment by credit card. Every year, there are billions of credit card transactions. When processing credit cards, retailers collect the purchaserís name as part of the card details. Retailers, rarely, if ever, ask the gender of the purchaser, nor their age.
If you run a brick-and-mortar store front, you might have some rough idea of the demographics of your clientele, but if you run an online business, you are totally blind. The more you understand who your customers are, the more effectively you can market to them.
Can you honestly say you have a good handle on the demographics of your customers?
In addition to running DataGenetics, I'm also involved in a company called GreatPokerHands which sells a a unique and novel product for teaching people how to play better poker. The product is a series of strategy cards which advise players of the strength of their starting hands, and how these hands change with the number of people at the table.
OK, enough sales pitch, if you want to know more, visit the site link above. Some of my product sales come directly from this website and we can use this sales data as an example database to analyze.
First of all, I scrubbed the sales database to remove just the First name from the sales records. By doing this, I removed any potential privacy implications associated with the storage and use of PII (Personally Identifiable Information), as first name alone is not sufficient information to uniquely identify a person.
We only need the first name anyway for this analysis, and the concept of data minimisation is another great pillar of privacy (only collect, store and process the information you need, and no more ... there's no point it moving around data you are not going to use ... if it's sensitive data, it's nothing but a liability). Plus using just first name makes for smaller files sizes, which can be important if the database is large!
If you're interested, below is a breakdown of the top most popular names for purchasers of the product (I've removed absolute numbers, and the bars show the relative size volume for each name). The names appear almost entirely male biased.
This is not the entire story because, remember, there is the long tail of names (there are many purchases made by people with distinct or very uncommon names). Also, as there is more variance of spellings for female names (plus a wider variety of starting names), the further down the tail you get, the more common female names become. Does the higher quantity of less common female names in the long tail balance out the sales attributed to males with more common names?
What is the overall average demographic for people who purchase these poker strategy cards? What is the average age? What is their gender? (Are there more females than males overall?) The more we know, the more we can optimize our marketing strategy. Should GreatPokerHands be advertising in products targetted towards students, middle-aged persons, or retired people? Should they be targeting magazines/activities enjoyed by men, or women? Let's find out …
By going through each sales record in turn, we can work out the distribution of demographics for the entire customer base by the superposition of the individual curves. Here are the results:
So, for GreatPokerHands the probability predicts that the client base is 84% male, and 16% female. Also, the the modal age is probably between 40-60 years of age.
For the above analysis, probabilities were left unmodified by age, but since we know the input in this example comes from Credit Card sales records we can factor this in. Since people under the age of 18 do not typically possess credit cards, we can re-run the analysis with curves for the probability disrtibution of ages based on names distribution where age is 18 years of age or older. The results of this are this:
Now that we have these curves, we can use it to target our advertising.
Merging together two different concepts, remember back to my blog article on Social Demographics, which tracked that likes and activities of users of facebook?
Using a little mathematics, we can compare the shape of our GreatPokerHands demographic curve against a portfolio of facebook keywords like the ones above.
The table on the left shows the percentage correlation between the profile curve for GreatPokerHand sales and 188 keywords on facebook (a selection of fan sights, games, and applications on the platform).
At the top, in position #1, is itself! (There is 100% correlation between the GreatPokerHands sales curve and itself).
Next in similarity, in position #2, with a 74.77% correlation (calculated using a technique similar to Chi-Squared matching) is Golf, followed by Star Trek, Scuba Diving and NASCAR.
Fishing comes next, followed by Investing (which makes sense considering the mathematical nature of poker, and the desire to speculate and accumulate wealth), followed by Chess, and then Dilbert.
At the other end of the scale, the things that the population curves for GreatPokerHand card sales are furthest away from are: Tiny Prints (Poker players, it seems, are not very interested in making wedding invites, or custom greetings cards!), they are also not into Sewing, Knitting or Cake Decorating. On TV, they are not excited by Sex and the City, Americas Next Top Model or Desperate Housewives. They don't listen to Justin Bieber, play Pokemon or Sorority Life games, oh, and they don't wear Make-up!
Based on this analysis, it looks like you might be seeing more of my TV commercial during the golf shows on TV!
This kind of analysis is not just limited to credit card records; maybe you have collected a database of names through some other means? If youíd like help unlocking additional value in your data, or to have us process your sales data to help build a demographic profile of your users, send us an email to request more information.
Check out other interesting blogarticles.