Data Science With RapidMiner

Monday, 23 March 2020

Corona virus death rates per million by country 23-03-2020

I've been exploring the latest figures for Corona virus deaths and I've combined this with population data to get an interesting graph.

This shows the deaths per million of the population of a country. The Y axis is a logarithmic scale. The X axis is date from the 1st March. I chose to plot this number for each country because death rates and population are numbers that are easy to understand and are less susceptible to misinterpretation.

As of today, 23/3/20, Italy is suffering 91 deaths per million in the country. On 04/03/20, the rate for Italy was 1.3 per million. Spain is suffering 36.8 deaths per million at the moment with 1 death per million back on 12/03/20.

What is interesting is the relative shapes of these graphs and their steepness between countries. For example, the UK, where I live, currently stands at 4.2 deaths per million. Italy was at that rate on 09/03/20. If we assume the growth continues because Italy and the UK look similar, we might conclude that we will be at the same place as Italy in 14 days time. This may be behind the UK prime minister's statement that we are 2 weeks behind Italy.

Of note too is Iran, where the graph is diverging from Italy perhaps indicating that the measures in the former country are having an effect. It's also interesting to see that Spain has a very steep graph and will soon overtake Italy as the most impacted country in terms of death rates per million of population. It's also interesting to see that Belgium and the Netherlands are rising very quickly and the slope of their lines looks similar to the US.

Friday, 8 March 2019

I'm giving a talk about RapidMiner

I'll be giving a talk at the Data Science Reading Meetup group on the 26th March entitled "Introduction to RapidMiner". It's intended to be a brief introduction that should help people decide if RapidMiner is right for them.

I just discovered that it's possible to refer someone else to RapidMiner, and if that person installs the product, you get 10,000 extra rows in your license up to a maximum of 50,000.

There are 28 people going to the talk. How I wish I could have 10,000 rows for each referral I plan to send!

Saturday, 8 December 2018

Seeing how generated attributes are constructed

Sometimes, a "brute force feature generation and selection-athon" is irresistible.

I had a feeling that some data I was looking at contained hidden relationships between attributes that could have yielded an improved prediction performance. I had a gut feel that dividing one attribute by another or perhaps taking the log of one and adding it the reciprocal of another might give a new attribute with more predictive power. How to do this without a tiresome manual intervention that would have been boring, could have missed some permutation, and could have made mistakes?

There are a number of ways of doing this in RapidMiner. One approach uses one of the iterating operators, collectively known as YAGGA, to perform an evolutionary search. Each iteration generates new attributes by combining existing attributes using simple functions. The performance is assessed and attributes that don't lead to an improvement are eliminated whilst those that do are retained to allow them to generate yet more attributes. This process repeats until the desired stopping conditions have been reached.

For the masochist, there is a lower level operator called "Generate Function Set" that allows control to be exerted over the operation. I adopted this because I wanted to look in detail at the attributes that were leading to improvements and equally see those that led nowhere.

So I made a process. But then I got stuck because I found that there was no way in the RapidMiner Studio GUI, to see what construction had been applied to generate new attributes. A bit of background; when RapidMiner generates new attributes, they show up with names of the form "gensymxxx". In the old days, there was a way of seeing the attribute construction from one of the viewing panes. Alas, it's not there anymore.

Luckily, there is an operator called "Write Constructions". This takes an example set and writes it to a file which contains details of the construction. A bit laborious but workable.

Did I find a new attribute that made an improvement? Yes I did. It was a small improvement but enough to be interesting. The improvement is the sort of thing that would get you from the middle of the leaderboard to be a contender in a Kaggle competition.

Data Science With RapidMiner

Search this blog

Monday, 23 March 2020

Corona virus death rates per million by country 23-03-2020

Friday, 8 March 2019

I'm giving a talk about RapidMiner

Saturday, 8 December 2018

Seeing how generated attributes are constructed

About Me

Labels

Blog Archive