It’s the end of the world as we know it, and Americans feel fine

It’s the end of the world as we know it, and Americans feel fine:

When I was young, we would be assigned to read books like 1984 in high school.  These were viewed as dystopian novels, as cautionary tales.  We would have the usual earnest class discussions.  Some feared the outcome, some thought it unlikely.  But everyone agreed that it would be a really bad thing.

Robin Hanson points out that 1984 has arrived, albeit 27 years late.  And what’s interesting is that no one seems to care:

Soon the police will always be watching every public move you make:

“A vast system that tracks the comings and goings of anyone driving around the District. … More than 250 cameras in the District and its suburbs scan license plates in real time. ..

With virtually no public debate, police agencies have begun storing the information from the cameras, building databases that document the travels of millions of vehicles. … The District [of Columbia] … has more than one plate-reader per square mile, the highest concentration in the nation. Police in the Washington suburbs have dozens of them as well … creating a comprehensive dragnet that will include all the approaches into the District. … The data are kept for three years in the District. … Police can also plug any license plate number into the database and, as long as it passed a camera, determine where that vehicle has been and when. …”

As prices rapidly fall, this will be widely deployed. Unless there is a public outcry, which seems unlikely at the moment, within twenty years most traffic intersections will probably have tag readers, neighboring jurisdictions will share databases, and so police will basically track all cars all the time. With this precedent, cameras that track pedestrians and people in cars via their faces and gaits will follow within another decade or two…

(Via TheMoneyIllusion)

Big Data, Fast & Slow: Why HP’s Project Moonshot Matters

Cloudline | Blog | Big Data, Fast & Slow: Why HP’s Project Moonshot Matters

In Marz’s presentation, which describes how Twitter’s Storm project complements Hadoop in the company’s analytics efforts, Marz says in essence (and here I’m heavily paraphrasing and expanding) that there are really two types of “Big Data”: fast and slow.

Fast “Big Data” is real-time analytics, where messages are parsed and for some kind of significance as they come in at wire speed. In this type of analytics, you apply a set of pre-developed algorithms and tools to the incoming datastream, looking for events that match certain patterns so that your platform can react in real time. A few examples: Twitter runs real-time analytics on the Twitter firehose in order to identify trending topics; Topsy runs real-time analytics on the same Twitter firehose in order to identify new topics and links that people are discussing, so that it can populate its search index; a high-frequency trader runs real-time analytics on market data in order to identify short-term (often in the millisecond range) market trends so that it can turn a tiny, quick profit.

Real-time analytics workloads are have a few common characteristics, the most important of which is that they are latency sensitive and compute-bound. These workloads are also bandwidth intensive in that the compute part of the platform can process more data than storage and I/O can feed it (hence the compute bottleneck). People doing real-time analytics need lots and lots of CPU horsepower (and even GPU horsepower in the case of HFT), and they keep as much data as they can in RAM so that they’re not bottlenecked by disk I/O.

I’ve drawn a quick and dirty diagram of this process, above. As you can see, the bottlenecks for Hadoop are the disk I/O from the data archive and the human brain’s ability to form hypotheses and turn them into queries. The first bottleneck can be addressed with SSD, while fixing the second is the job of the growing stack of more human-friendly tools that now sits atop Hadoop.

More at Cloudline

Understanding human mobility with machine learning and a billion check-ins

Understanding human mobility with machine learning and a billion check-ins:

At foursquare, we believe there is a huge opportunity to apply machine learning algorithms to the collective movement patterns of millions of people and build new services which help people better understand and connect with places… In the slides below, we talk briefly about the data at foursquare and some interesting applications of machine learning. Enjoy!

Machine Learning and Big Data at Foursquare


(Via Foursquare Engineering Blog)