[talk] Machine Learning

Sat Aug 26 09:21:44 EDT 2017

On Sat, Aug 26, 2017 at 7:18 AM, Thomas Levine <_ at thomaslevine.com> wrote:

> I don't follow your use case chart; could you give an example?
>
> It sounds, though, like you and your friends are talk past each other.
> They discuss job requirements, and you discuss software design. Dijkstra
> commented on this common conflict.
>
>   Simplicity is a great virtue but it requires hard work to achieve it
>   and education to appreciate it. And to make matters worse: complexity
>   sells better.
>   https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD896.html
>
> Ignoring the need to sell newfangled complexity, I find generic search
> engines, strict text parsers, and summary statistics to be far more
> practical, effective, and reliable than the overwhelming majority of
> branded machine learning products. This is just like how I prefer the
> reliability and ease-of-use of OpenBSD over the novelty, complexity, and
> opacity of most of the other popular contemporary operating systems.
>
> _______________________________________________
> talk mailing list
> talk at lists.nycbug.org
> http://lists.nycbug.org/mailman/listinfo/talk
>

Ignoring the need to sell newfangled complexity, I find generic search
engines, strict text parsers, and summary statistics to be far more
practical, effective, and reliable than the overwhelming majority of
branded machine learning products. This is just like how I prefer the
reliability and ease-of-use of OpenBSD over the novelty, complexity, and
opacity of most of the other popular contemporary operating systems.

One of the reasons machine learning is becoming popular is because big data
is commodity now. Users can store vast amounts of data in "the cloud"
BigQuery, Hadoop, Redshift, Oracle RAQ, etc. Ten years ago the dominant
challenge was cheaply storing and querying data, but now that is becoming
commodity. An enterprise can buy tableau and use Amazon Redshift, and have
an analyst (or a non technical product manager) give them summary
statistics untill all parties are blue in the face, 500 scheduled reports
running a day.

You can not replace summary statistics with machine learning. A classic
machine learning tool is linear regression which you can use to make
predictions. You take a dataset and you train a model. That model can be
used to make predictions.

For example given: users with tens/hundreds/thousands of attributes (age,
gender,...) and a bid request with (tens/hundreds/thousands) of attributes
(time of day, url,...), what attributes can be used to predict the final
bid price? Running one process (Linear Regression) that tells what
attribute or combinations of attributes predicts the price, COULD BE
easier/more simple then having humans attempt to figure it out by producing
different sets of summary statistics and collectively deciding what to
optimize on, and constantly re-evaluating the rules as the landscape
changes.

Obviously there is hype cycle, not every problem needs machine learning to
solve. But get readdyyy for a ::shocker:: not everything is solved by BSD
port tree.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nycbug.org:8443/pipermail/talk/attachments/20170826/fdd5eb57/attachment.htm>