All posts in internet

Big Data & Big Mistakes


Great read by Tim Harford on FT:

Recall big data’s four articles of faith. Uncanny accuracy is easy to overrate if we simply ignore false positives, as with Target’s pregnancy predictor. The claim that causation has been “knocked off its pedestal” is fine if we are making predictions in a stable environment but not if the world is changing (as with Flu Trends) or if we ourselves hope to change it. The promise that “N = All”, and therefore that sampling bias does not matter, is simply not true in most cases that count. As for the idea that “with enough data, the numbers speak for themselves” – that seems hopelessly naive in data sets where spurious patterns vastly outnumber genuine discoveries.
“Big data” has arrived, but big insights have not. The challenge now is to solve new problems and gain new answers – without making the same old statistical mistakes on a grander scale than ever.

Simmons on Feeds

“RSS is at its most interesting and useful when big companies ignore it.” –Brent Simmons

The People Holding the Literal Keys to Internet Security

“The master key is part of a new global effort to make the whole domain name system secure and the internet safer: every time the keyholders meet, they are verifying that each entry in these online “phone books” is authentic. This prevents a proliferation of fake web addresses which could lead people to malicious sites, used to hack computers or steal credit card details.” –read more at The Guardian.

Computers On Law & Order

Jeff Thompson’s Computers On Law & Order includes 11,000 screenshots of all 456 episodes of the classic drama, and will trace the history of the computer across set design in the series. Just amazing. Found here.

Craigslist Mirrors

Undoubtedly the most important thing I’ve seen in days: a Tumblr dedicated to photographs of mirrors on Craigslist. Just incredible.

Source Code in TV & Film


Source Code in TV & Film is wonderful, even if you don’t care much for or understand code. Find out what all those blackboard formulas have been lifted from and why it makes no sense, or find the guy who actually wrote code to demonstrate the use of Raw Sockets in writing Packet Injection programs. The image above shows code used in White House Down when they were, of course, hacking into a mainframe. :)

Endangered Links

Can you help to save the endangered hyperlink?

Felix Salmon: Netflix’s Dumbed-Down Algorithms

The original Netflix prediction algorithm — the one which guessed how much you’d like a movie based on your ratings of other movies — was an amazing piece of computer technology, precisely because it managed to find things you didn’t know that you’d love. More than once I would order a movie based on a high predicted rating, and despite the fact that I would never normally think to watch it — and every time it turned out to be great. The next generation of Netflix personalization, by contrast, ratchets the sophistication down a few dozen notches: at this point, it’s just saying “well, you watched one of these Period Pieces About Royalty Based on Real Life, here’s a bunch more”.

Read the rest here.

Josh Marshall: Flipboard is a Scam Against Publishers?

Josh Marshall on Talking Points Memo’s decision to pull out of Flipboard, Google Currents, etc.:

But say you find TPM on Flipboard, decide it’s great and add it to your viewing routine on Flipboard. Probably you just keep reading us on Flipboard. Clearly you like Flipboard or you wouldn’t be using it. So why would you start visiting TPM? You likely won’t. That may be great for you. It’s definitely great for Flipboard. But is it great for us? Not really. It boosts my ego, I guess. And more people may know about us. But where and how does that turn into our ability to convert that ‘audience’ into a revenue stream that allows us to create our product? I don’t think it does. Or it does in so in such a trivial and unquantifiable way as to be meaningless.

How does he know that users don’t connect the dots back to the site after using Flipboard to discover them? He’s basing a ton of his opinion here on that assumption. I know for my own experience, I love using tools like Flipboard, Feedly, etc. to discover new sites, and when I like them I add them to my reader, and I visit those sites and open those links and share the articles I like here and other places. Maybe the majority of users don’t convert in that same way. That said, I do understand his issue with the fuzzy logic around how things like reach and brand awareness are benefiting them when their goal is to find revenue streams to keep producing their work. I totally get that.

However, if you’re cutting off from your readers in an attempt to own every page view so your banner ads are more valuable (not saying that’s his plan, more so pointing out that the plan in general is a bit more traditional and focused on hard data like CTR and direct streams), you’re holding yourself back from real potential in terms of both revenue and reader growth. I don’t quite get it, despite understanding (and sympathizing with) large-scale digital news sites that are now struggling to manage million dollar solvency issues annually, much like newspapers scrambled to do years ago. It’s about a clear-cut cost-based analysis for Marshall, but I’m not sure he’s correct as he cuts off values that don’t ‘directly’ influence revenue.

Google’s “Hummingbird” Prioritizes Quality Over Keywords

“The Hummingbird update will put less emphasis on matching keywords and more emphasis on understanding what a user is most likely hoping to obtain in their search results. If I can give businesses one piece of advice after this update, it’s to prioritize a well-rounded online marketing strategy that continues to deliver a clear message. Every business in America has an audience, but not every business in America understands the needs of their audience. The companies who prioritize the needs of their users and create content to satisfy those needs will see the biggest successes in the future.”

Read more about Hummingbird’s impact over at Wired. I really hope content marketers and people working in search are paying attention – before they don’t have a job.

Baldur Bjarnason “Computers are too difficult and people are computer illiterate”

That blogger also demonstrates his linguistic ignorance when he explains that he likes to be an arsehole whenever somebody uses the term ‘internet’ to mean ‘my access to the internet’ instead of the internet itself. As in ‘the internet isn’t working’.

I mean, just how stupid do you have to be to not realise that almost everybody who says this knows very well that the entire internet hasn’t stopped working? It’s analogous to saying ‘the TV channels aren’t working’ when your cable TV set-top box is on the fritz. It doesn’t mean you think those channels aren’t broadcasting. It means that you don’t have access to any of them.

It isn’t just stupid to misunderstand language like this, it demonstrates a wilful ignorance of spoken English, wilful because he’s clearly heard the phrase often enough to understand what people are actually trying to say.

Please read.

Anil Dash’s Ten Rules of Internet

7. Most websites treat “I like it” and “This is good” as the same thing, leading to most people on the Internet refusing to distinguish between “I don’t like it” and “It’s not good.”

From Anil Dash’s Ten Rules of Internet.

Robin Sloan on RSS

As long as the URL resolves, a feed can still surprise you. RSS is the true web: a loose net of dark filaments. These faint tendrils of connection are almost invisible when quiescent, but then out of nowhere—hello!—they light up again. I am happy to have them.

Preach it.

Mobile Only Users

This is a great article and super informative for folks who might still feel like a good mobile experience is too complicated for the return.

Mobile-only users aren’t some strange new breed of customer, signaling their desire for different messages, content, and services through their choice of screen size and form factor. They’re just your customer. You can and should speak to them in same way you address all your other customers. They just want to engage with you on the device that’s most useful and convenient for them.

Meeting the needs of the mobile-only user doesn’t mean agonizing about “the mobile use case,” trying to determine which subset of content would be most useful to users “on-the-go.” Google reports that 77 percent of searches from mobile devices take place at home or work, only 17 percent on the move. Mobile users should get the same content. It’s frustrating and confusing for them if you only give them a little bit of what you offer on your “real” website. If you try to guess which subset of your content the mobile user needs, you’re going to guess wrong. Deliver the same content as your desktop user sees. (If you think some of your content doesn’t deserve to be on mobile, guess what — it doesn’t deserve to be on the desktop either. Get rid of it.)

“…on the device that’s most useful and convenient for them.” Which could be a mobile device, might not. But they deserve the best experience regardless of their choice. In this way, we’re trying to talk about proximate use, not strictly ‘mobile only’ use. I don’t think we’re convincing businesses that mobile is every part of the customer pie so to speak as long as we continue to segment out the users when we talk about the site experience.

All customers deserve great site experiences regardless of device, and many users are choosing their devices based on what’s near them, not by some mysterious code we can’t break. Since many of those users are indeed coming from mobile devices and those numbers are rising, it’s more important than ever to treat them as well as the ‘traditional’ customers you’ve developed desktop site experiences for.

Internet Archive from Deepspeed media on Vimeo.

Video: Internet Archive Short Documentary

An incredible look at how the Internet Archive functions.

Load More