The positioning of nucleosomes is very important in the regulation of genes, since it regulates the accessibility for the binding of transcription factors and the speed of transcription. However, the influences to steer nucleosomes are not only limited to sequence. Therefore, we developed a random forest tool to disinguish nucleosome from linker DNA in order to evaluate the local nucleosome support based on DNA sequence. The classifier uses a transformation from FASTA files to a frequency spectrum of local DNAshape. A basic version of this classifier and further documentation can be found at the gitlab link in the classification section.
We further applied this classifier in a sliding window approach to the whole genome with a sliding window step size of 50 bp and an outcome of 0 (linker) or 1 (nucleosomal) which can be downloaded in the resources section under the name NFScore50. Additionally, we created a more detailed version with a score between 0 and 1 for all human promoters (RefSeq, ± 1.5 kb around each unique TSS) in a resolution of 7 bp under the name NFScore7. The latter score represents the relative frequency of nucleosomal predictions out of all sliding windows the particular bp has been part of.
Code for the classification tool with the according documentation can be found under:
Currently under submission. More soon...
The raw FASTA files for training the classifier are taken from:
Shou-Hui Guo, En-Ze Deng, Li-Qin Xu, Hui Ding, Hao Lin, Wei Chen, and Kuo-Chen Chou. iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple ucleotide composition. Bioinformatics, 30(11):1522–1529, February 2014.
Their data is in turn derived from:
Dustin E. Schones, Kairong Cui, Suresh Cuddapah, Tae Young Roh, Artem Barski, Zhibin Wang, Gang Wei, and Keji Zhao. Dynamic regulation of nucleosome positioning in the human genome. Cell, 132(5):887–898, March 2008.