Marco Santoni

6 Take-Aways after Reading "The Signal and The Noise"

Published: Sat 11 November 2017
By Marco Santoni

In posts.

The Signal and The Noise by Nate Silver is a must-read book for those interested in predictions. It is not a technical book. You will not learn any algorithm. However, it presents a series of real-world scenarios when predictions did work and where predictions did not work. The book is …
read more
My Talk about Superset [Python Milano Meetup]
Published: Thu 22 June 2017
By Marco Santoni

In posts.

Yesterday, I gave a talk Python Milano Meetup. The Meetup was designed as Python pills: three 20-minutes talks in a row. The talks:
- Superset: data visualization at AirBnB - Marco Santoni
- Java Vs Python - Cesare Placanica
- pdb in action - Lorenzo Mele
Very nice talk of @Airbnb #Superset with @MrSantoni at #PythonMilano …
read more
Manufacturing. When data is not a commodity

Published: Sat 25 February 2017
By Marco Santoni

In posts.

What does it mean to work as a data scientist in manufacturing? What is the value behind data? Data science has gained popularity in domains like internet, but the industrial production domain has specific requirements.

I gave a talk at Data Driven Innovation about the specific challenges when doing data …
read more
Weighted Random Sampling with PostgreSQL [Follow-up]

Published: Fri 10 February 2017
By Marco Santoni

In posts.

I received valuable feedbacks by Jim Nasby regarding the post about weighted random sampling with PostgreSQL. I will report here Jim's email.

Sadly, Common Table Expressions (CTE)s are insanely expensive, because each one must be fully materialized. So in your example, you're essentially creating 5 temp tables (one for …
read more
Monitoring Bus Frequencies in Rome

Published: Sat 21 January 2017
By Marco Santoni

In posts.

I have just launched atacmonitor. It is a website providing information about the waiting time at bus stops in Rome.

Overview

The datasource is live data about bus waiting time of ATAC, Rome's public transport company. The transport office provides public API with real-time data.

I have implemented a simple …
read more
Blog Migrated to Pelican on GitHub Pages

Published: Wed 28 December 2016
By Marco Santoni

In posts.

I have migrated my blog. It is built under Pelican, a static site generator. It allows me to write posts as plain markdown or even Jupyter notebooks. I then use GitHub Pages to version and publish the blog. I am continuing to use Aruba as domain provider. It is sufficient …
read more
Insights from IEEE Big Data 16

Published: Mon 26 December 2016
By Marco Santoni

In posts.

I have attended the IEEE Big Data 16 conference in Washington DC. I thank my company for sponsoring the trip. The conference included a special symposium dedicated to manufacturing. The symposium hosted some participants of the Bosch Production Line Performance competition from Kaggle.

2016 IEEE International Conference on Big Data …
read more
Weighted Random Sampling with PostgreSQL
Published: Tue 23 August 2016
By Marco Santoni

In posts.

You have a table like the following:
```
CREATE TABLE weights (
color varchar primary key,
weight float
);

INSERT INTO weights (color, weight)
VALUES
('red', 8),
('blue', 3),
('green', 10),
('yellow', 10);
```
The table lists the weights associated with certain colors. Imagine a weight representing how much you like that color.

Now …
read more
Applied Bayesian Inference with PyMC [video]

Published: Thu 30 June 2016
By Marco Santoni

In posts.

I was glad to give an intro to Bayesian Inference at PyData Florence 2016. The video of the talk is now out.
read more
A Simple Machine Learning Pipeline

Published: Sun 19 June 2016
By Marco Santoni

In posts.

This post contains the code that I used in my talk at Python Milano Meetup on June 22nd 2016. The talk was a quick overview of Pipeline, a nice API by scikitlearn to abstract your machine learning algorithm. It is based on the Boston Housing Data Set.

We'll just load …
read more

Overview

social