Skip to main content

The Bestseller Code - Review

Despite all the efforts of publishers, it has always seemed impossible to predict whether or not a book would be runaway bestseller. This isn't too surprising - it's the kind of thing that is inherently unpredictable because there are simply so many variables involved. Yet a newly published book suggests it is possible to do just that. Are the authors crazed or brilliant? Neither, really. They have put together a mechanism based on computerised text analysis that is good at spotting bestsellers - and yet, oddly, this doesn't contradict that inherent unpredictability. Why? Because there are two different levels of bestsellerdom involved - and because I think there's one bit of information missing from the book (apologies to the authors if I've missed it).

So what does the software do? By looking at various word uses, patterns and shaping, it can make a good shot at predicting whether or not a book is likely to have featured on the New York Times bestseller list. This is very impressive - and, along the way, Jodie Archer and Matthew Jockers give some excellent advice on things that authors can do (or at least try to do) that will make their books more like these bestsellers.

This isn't a universal panacea. In fact the authors admit that what their algorithms spot is not what most would regard as great fiction. The system laps up the like of the output of Dan Brown and 50 Shades of Grey. But interestingly, it also is useful counter to those who say they can't understand why these kind of books sell because they are terribly written. In fact, in a number of respects these books are very well written - it's just that the criteria for 'well written' are not those used by the lit. crit. brigade.

Not only is this not a recipe for producing great literature, it's not about producing books everyone would like either. Taking a quick skim through the top 100 books selected by the analysis, there are perhaps three I would consider reading. But many of us are not 'bestseller' readers. We like our own little niches, and that's fine. This system isn't for us - it is about finding likely hits for the traditional bestseller market.

This genuinely is all very interesting, although the book has surprisingly little content for a full price hardback (it's large print, and there's a lot of dancing around exactly what they are doing). However what absolutely isn't true is the assertion made here that 'mega-bestsellers are not black swans'. The system uses a number of measures, and though it's true that most mega-sellers like Harry Potter and 50 Shades do well on some of the measures, they pretty well all fall down on others. So, for instance, to write a bestseller we are encouraged to avoid fantasy, very British topics, sex and descriptions of bodies. What the model seems to do well is to recognise what you might call the run-of-the-mill bestsellers, rather than pick out most of the real runaway successes as being stand-out.

There was also that missing bit of information. The authors are enthusiastic to tell us how many books that scored highly from their system were on the bestseller list, and that really is impressive. But they don't mention false positives - how many books the system thought should be bestsellers but weren't. That would have been interesting to discover more about.

I'm sure we'll hear more of this kind of analysis, but I really hope publishers don't put too much stock by it - because it is very much a lowest common denominator approach (certainly from the viewpoint of someone who wouldn't consider reading more than 95% of their recommendations). That's not to say that the book isn't interesting - and for an author, there are some excellent insights into some of the things that attract this generic group of readers (or put them off) that are worth considering even if you do write science fiction or British crime fiction (say).

A fascinating piece of analysis, provided you don't take it all too seriously.

The Bestseller Code is available from amazon.co.uk and amazon.com.
Using these links earns us commission at no cost to you  

Comments

Popular posts from this blog

Why I hate opera

If I'm honest, the title of this post is an exaggeration to make a point. I don't really hate opera. There are a couple of operas - notably Monteverdi's Incoranazione di Poppea and Purcell's Dido & Aeneas - that I quite like. But what I do find truly sickening is the reverence with which opera is treated, as if it were some particularly great art form. Nowhere was this more obvious than in ITV's recent gut-wrenchingly awful series Pop Star to Opera Star , where the likes of Alan Tichmarsh treated the real opera singers as if they were fragile pieces on Antiques Roadshow, and the music as if it were a gift of the gods. In my opinion - and I know not everyone agrees - opera is: Mediocre music Melodramatic plots Amateurishly hammy acting A forced and unpleasant singing style Ridiculously over-supported by public funds I won't even bother to go into any detail on the plots and the acting - this is just self-evident. But the other aspects need some ex

Is 5x3 the same as 3x5?

The Internet has gone mildly bonkers over a child in America who was marked down in a test because when asked to work out 5x3 by repeated addition he/she used 5+5+5 instead of 3+3+3+3+3. Those who support the teacher say that 5x3 means 'five lots of 3' where the complainants say that 'times' is commutative (reversible) so the distinction is meaningless as 5x3 and 3x5 are indistinguishable. It's certainly true that not all mathematical operations are commutative. I think we are all comfortable that 5-3 is not the same as 3-5.  However. This not true of multiplication (of numbers). And so if there is to be any distinction, it has to be in the use of English to interpret the 'x' sign. Unfortunately, even here there is no logical way of coming up with a definitive answer. I suspect most primary school teachers would expands 'times' as 'lots of' as mentioned above. So we get 5 x 3 as '5 lots of 3'. Unfortunately that only wor

Which idiot came up with percentage-based gradient signs

Rant warning: the contents of this post could sound like something produced by UKIP. I wish to make it clear that I do not in any way support or endorse that political party. In fact it gives me the creeps. Once upon a time, the signs for a steep hill on British roads displayed the gradient in a simple, easy-to-understand form. If the hill went up, say, one yard for every three yards forward it said '1 in 3'. Then some bureaucrat came along and decided that it would be a good idea to state the slope as a percentage. So now the sign for (say) a 1 in 10 slope says 10% (I think). That 'I think' is because the percentage-based slope is so unnatural. There are two ways we conventionally measure slopes. Either on X/Y coordiates (as in 1 in 4) or using degrees - say at a 15° angle. We don't measure them in percentages. It's easy to visualize a 1 in 3 slope, or a 30 degree angle. Much less obvious what a 33.333 recurring percent slope is. And what's a 100% slope