Apart from probabilistic models, we've developed a unified approach to deal with data in terms of predicates with deep knowledge of domain. It allows to deal with epigenetic changes in terms of logical expressions over histone modifications, methylation and transcription data.
We created a hybrid algorithms combining Associated Rules Learning technique and Information Theory for automated building Ishikawa diagram. This allowed to find causation-like associations in epigenetics data.
Association Rule Mining is a data mining technique for exploring hidden dependencies from observational data. Fishbone ARM is a novel approach combining bottom-to-top rules mining with information theory and multiple testing resulting in statistically significant and interpretable results. Visualization with Ishikawa diagrams helps to understand resulting relationships and provides rich capabilities for data filtration.
"ARM using Fishbone diagram" is a follow-up project done by students in Bioinformatics Institute. Student tasks were:
- Investigate and improve algorithms
- Improve the visualization web service
- Evaluate algorithm on Ciofani dataset
On figure 1, we present the web interface of the service.
This service implements a novel approach to mine association rules within specified data.
It also implements filtering of unproductive rules according to 'improvement' metric  with corresponding significance check. Significance check is done using holdout approach .
The following scheme represents rule mining workflow:
Daria Likholetova and Nina Lukashina had been working on this project during the summer internship under mentorship by Peter Tsurinov and Oleg Shpynov. Daria was working on biological interpretation of algorithm’s results as well as on data preparation and analysis. Nina was mostly focused on algorithms and web service development. They achieved the following results:
- Improved algorithm by adding LOE criterion, a measure of interestingness, and statistical significance check
- Improved service usability because of better UI
- Successfully validated the new approach on Ciofani dataset
- Rules have a reasonable biological meaning
These findings suggest that the method can produce novel biological knowledge from observational data and provide rich visualization and analytic capabilities.