[nycbug-talk] Scary Ubuntu privacy junk

Pete Wright pete at nomadlogic.org
Thu Nov 1 14:22:31 EDT 2012

On 11/01/12 10:52, George Rosamond wrote:
>> gathering/mining and analyzing all of this data is *very* expensive and
>> it would not be happening if there was monetary value in it.  the fact
> Is it really *that* expensive?  Of course Amazon is doing it for a
> reason, and it's worthwhile, but aggregating data and storing on itself
> isn't.  Having the mechanism to analyze is higher cost, but with any
> group's search data, I'm sure it's worth it.

while initially it seems like a pretty simple problem to solve.  you run 
a website and/or adserver - you drop a cookie in a browser and log every 
time you see that cookie UUID show up when being served another ad or 
page impression in the future.  but lets get a little crazy and serve an 
add that is related to what we suspect your interests are based on past 
impressions, i.e. targeting in the adserver world.

it gets expensive when you want access, process, slice and dice this 
data in a reliable manner so we can target ads at you quickly and 
efficiently.  reliable is the key term here.  sure you can dump all your 
access_logs into S3 - but if you want to do any sort of processing on 
this data you most likely are doing to do some mapreduce'ie type of work 
and eventually dump that processed data into a relational database at 
some point.

then you decide that generating reports isn't enough, you want your 
servers to act on this data in a timely manner...so then you build an 
infrastructure that can do high speed data warehousing and 
analytics...then you realize "oh, poop.  ec2 kinda sucks since it is 
predicated on being unreliable.  now i gotta either duplicate all my ec2 
instances in N+1 regions or build my own global infrasturcture."  and 
then the next thing you know you have a bigger backend supporting your 
adserver or website, just to mine user data.

but i digress...sure, storing data is relatively cheap.  actually 
turning that data into something you can monetize is an expensive 

also...have you seen how the cost of coffee is going up these days(1) 
maybe that is where all the money is going - you gotta feed these coders 
one way or another :)


Pete Wright
pete at nomadlogic.org
twitter =>  @nomadlogicLA

More information about the talk mailing list