Apart from probabilistic models, we've developed a unified approach to deal with data in terms of predicates with deep knowledge of the domain. It allows to deal with epigenetic changes in terms of logical expressions over histone modifications, methylation and transcription data.
We created a hybrid algorithm combining Associated Rules Learning technique and Information Theory for automated building of an Ishikawa diagram. This allowed to find causation-like associations in epigenetics data.
Association Rule Mining is a data mining technique for exploring hidden dependencies in observational data. Fishbone ARM is a novel approach combining bottom-to-top rules mining with information theory and multiple testings resulting in statistically significant and interpretable results. Visualization with Ishikawa diagrams helps to understand the resulting relationships and provides rich capabilities for data filtration.
"ARM using Fishbone diagram" is a follow-up project carried out by students at the Bioinformatics Institute. Student tasks were:
- Investigate and improve algorithms
- Improve the visualization web service
- Evaluate the algorithm on Ciofani dataset
This service implements a novel approach to mine association rules in specified data.
It also implements filtering of unproductive rules according to the 'improvement' metric  with corresponding significance check. Significance check is performed using the holdout approach .
The following scheme illustrates the rule mining workflow:
Daria Likholetova and Nina Lukashina were working on this project during the summer internship under the mentorship of Peter Tsurinov and Oleg Shpynov. Daria was working on the biological interpretation of the algorithm’s results as well as on data preparation and analysis. Nina was mostly focused on algorithms and web service development. They achieved the following results:
- Improved the algorithm by adding the LOE criterion, a measure of interestingness, and statistical significance check
- Improved service usability because of better UI
- Successfully validated the new approach on the Ciofani dataset
- Rules have a reasonable biological meaning
These findings suggest that the method can produce novel biological knowledge from observational data and provide rich visualization and analytic capabilities.