![]() Correlation matrix using the Biokit libraryĪlso notice that – quite conveniently – the correlation matrix of Figure 1 is reordered with strongly correlated variables adjacent to one another, which facilitates interpretation. bivariate ellipses in the lower triangle (ellipse direction and colour indicates whether positive or negative correlation ellipticity and colour intensity are proportional to the correlation coefficient)įigure 1.correlation coefficient in upper triangle (colour and intensity indicate whether positive or negative correlation, and its strength, respectively).With only a bit of tinkering I was able to produce, shown in Figure 1, a correlation matrix with: But after some digging I found the Biokit library, which comes with a lot of useful visualizations, among which corrplot is exactly what I was looking for. I did not end up using a colourmap for the facecolour of the plot (although this would probably be relatively easy, in an earlier attempt using hex-bin plots, the colourmap scaling of each plot independently – to account for outliers – proved challenging). In a comment to the post, Matt Hall got me thinking about other ways to visualize the correlation coefficient. shape of the bivariate distributions (KDE) on the diagonal.bivariate scatter-plots in the upper triangle, annotated with rank correlation coefficient, confidence interval, and probability of spurious correlation. ![]() I am very pleased with having been able to put together, by the end of it, a good looking scatter matrix that incorporated: Three additional ‘special’ variables are: Random 1 and Random 2, which are range bound and random, and were included in the paper, and Gross pay transform, which I created specifically for this exercise to be highly correlated to Gross pay, by passing Gross pay to a logarithmic function, and then adding a bit of normally distributed random noise. The independent variables are: Gross pay, in meters Phi-h, porosity multiplied by thickness, with a 3% porosity cut-off Position within the reservoir (a ranked variable, with 1.0 representing the uppermost geological facies, 2.0 the middle one, 3.0 the lowest one) Pressure draw-down in MPa. ![]() The dependent/target variable is oil production (measured in tens of barrels of oil per day) from a marine barrier sand. As a reminder to aficionados, but mostly for new readers’ benefit: I am using a very small toy dataset (only 21 observations) from the paper Many correlation coefficients, null hypotheses, and high value (Hunt, 2013). In my last post I wrote about visual data exploration with a focus on correlation, confidence, and spuriousness.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |