References¶

AAB+15: Mart\'ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: large-scale machine learning on heterogeneous systems. 2015. Software available from tensorflow.org. URL: https://www.tensorflow.org/.
ATHJ21: Manuel Anglada-Tort, Peter MC Harrison, and Nori Jacoby. Repp: a robust cross-platform solution for online sensorimotor synchronization experiments. bioRxiv, 2021.
BKK18: Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. CoRR, 2018. URL: http://arxiv.org/abs/1803.01271, arXiv:1803.01271.
Bot91: Léon Bottou. Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes, 91(8):12, 1991.
BockKW14a: S. Böck, F. Krebs, and G. Widmer. A Multi-model Approach to Beat Tracking Considering Heterogeneous Music Styles. In 15th Conf. of the Int. Soc. for Music Information Retrieval (ISMIR 2014), 603–608. Taipei, Taiwan, October 2014.
BockD20: Sebastian Böck and Matthew EP Davies. Deconstruct, analyse, reconstruct: how to improve tempo, beat, and downbeat estimation. Proc. of ISMIR (International Society for Music Information Retrieval). Montreal, Canada, pages 574–582, 2020.
BockDK19: Sebastian Böck, Matthew EP Davies, and Peter Knees. Multi-task learning of tempo and beat: learning one to improve the other. In ISMIR, 486–493. 2019.
BockKW14b: Sebastian Böck, Florian Krebs, and Gerhard Widmer. A multi-model approach to beat tracking considering heterogeneous music styles. In Proc. of the 15th Intl. Society for Music Information Retrieval Conf. (ISMIR), 603–608. Taiwan, Tapei, 2014.
BockS11: Sebastian Böck and Markus Schedl. Enhanced beat tracking with context-aware neural networks. In Proc. Int. Conf. Digital Audio Effects, 135–139. 2011.
BockKW16: S. Böck, F. Krebs, and G. Widmer. Joint beat and downbeat tracking with recurrent neural networks. In 17th International Society for Music Information Retrieval Conference (ISMIR). 2016.
CFG18: Tian Cheng, Satoru Fukayama, and Masataka Goto. Convolving gaussian kernels for rnn-based beat tracking. In 2018 26th European Signal Processing Conference (EUSIPCO), 1905–1909. IEEE, 2018.
CVMerrienboerG+14: K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
CFCS17: Keunwoo Choi, György Fazekas, Kyunghyun Cho, and Mark Sandler. A tutorial on deep learning for music information retrieval. arXiv preprint arXiv:1709.04396, 2017.
C+15: François Chollet and others. Keras. https://keras.io, 2015.
DDP09: Matthew EP Davies, Norberto Degara, and Mark D Plumbley. Evaluation methods for musical audio beat tracking algorithms. Queen Mary University of London, Centre for Digital Music, Tech. Rep. C4DM-TR-09-06, 2009.
DRuaP+12: Norberto Degara, Enrique Argones Rúa, Antonio Pena, Soledad Torres-Guijarro, Matthew EP Davies, and Mark D Plumbley. Reliability-informed beat tracking of musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 20(1):290–301, 2012.
DSdHMuller19: Jonathan Driedger, Hendrik Schreiber, W. Bas de Haas, and Meinard Müller. Towards automatically correcting tapped beat annotations for music recordings. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, November 2019.
DBDR15: S. Durand, J. P. Bello, B. David, and G. Richard. Downbeat tracking with multiple features and deep neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume. 2015. doi:10.1109/ICASSP.2015.7178001.
DBDR16: S. Durand, J. P. Bello, B. David, and G. Richard. Feature adapted convolutional neural networks for downbeat tracking. In IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP). 2016.
DBDR17: S. Durand, J. P. Bello, B. David, and G. Richard. Robust Downbeat Tracking Using an Ensemble of Convolutional Networks. IEEE/ACM Trans. on Audio, Speech, and Language Processing, 25(1):76–89, January 2017.
DE16: S. Durand and S. Essid. Downbeat Detection With Conditional Random Fields And Deep Learned Features. In 17th Int. Soc. for Music Information Retrieval Conf. (ISMIR 2016), 386–392. New York, USA, August 2016.
DDR14: Simon Durand, Bertrand David, and Gaël Richard. Enhancing downbeat detection when facing different music styles. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3132–3136. IEEE, 2014.
Ell07: Daniel PW Ellis. Beat tracking by dynamic programming. Journal of New Music Research, 36(1):51–60, 2007.
FJDE15: T. Fillon, C. Joder, S. Durand, and S. Essid. A conditional random field system for beat tracking. In IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 424–428. South Brisbane, Australia, April 2015.
FMR+19: M. Fuentes, L. S. Maia, M. Rocamora, L. W. P. Biscainho, H. C. Crayencour, S. Essid, and J. P. Bello. Tracking beats and microtiming in afro-latin american music using conditional random fields and deep learning. In 20th International Society for Music Information Retrieval Conference, ISMIR. 2019.
FMC+19: M. Fuentes, B. McFee, H.C. Crayencour, S. Essid, and J.P. Bello. A music structure informed downbeat tracking system using skip-chain conditional random fields and deep learning. In 44th Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 481–485. Brighton, UK, May 2019.
FMC+18: Magdalena Fuentes, Brian McFee, Hélène Crayencour, Slim Essid, and Juan Bello. Analysis of common design choices in deep learning systems for downbeat tracking. In The 19th International Society for Music Information Retrieval Conference. 2018.
GBC16: Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
Got01: Masataka Goto. An audio-based real-time beat tracking system for music with or without drum-sounds. Journal of New Music Research, 30(2):159–171, 2001.
GTH19: Alexander Greaves-Tunnell and Zaid Harchaoui. A statistical investigation of long memory in language and music. arXiv preprint arXiv:1904.03834, 2019.
GSKoutnik+16: Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. Lstm: a search space odyssey. IEEE transactions on neural networks and learning systems, 28(10):2222–2232, 2016.
HM04: Stephen W Hainsworth and Malcolm D Macleod. Particle filtering applied to musical tempo tracking. EURASIP Journal on Advances in Signal Processing, 2004(15):927847, 2004.
HZCPerpinan04: Xuming He, Richard S Zemel, and Miguel Á Carreira-Perpiñán. Multiscale conditional random fields for image labeling. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 2, II–II. IEEE, 2004.
HS97: Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
HDF12: Jason Hockman, Matthew EP Davies, and Ichiro Fujinaga. One in the jungle: downbeat detection in hardcore, jungle, and drum and bass. In ISMIR, 169–174. 2012.
HKS14: A. Holzapfel, F. Krebs, and A. Srinivasamurthy. Tracking the 'odd': meter inference in a culturally diverse music corpus. In 15th Int. Society for Music Information Retrieval Conf. (ISMIR), 425–430. Taipei, Taiwan, October 2014.
HG16: Andre Holzapfel and Thomas Grill. Bayesian meter tracking on learned signal representations. In ISMIR-International Conference on Music Information Retrieval, 262–268. ISMIR, 2016.
HDZ+12: André Holzapfel, Matthew E. P. Davies, José R. Zapata, João Lobato Oliveira, and Fabien Gouyon. Selective sampling for beat tracking evaluation. IEEE Transactions on Audio, Speech, and Language Processing, 20(9):2539–2548, 2012. doi:10.1109/TASL.2012.2205244.
IS15: S. Ioffe and C. Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift. In 32nd International Conference on Machine Learning (ICML). 2015.
JLL19: Bijue Jia, Jiancheng Lv, and Dayiheng Liu. Deep learning-based automatic downbeat tracking: a brief review. Multimedia Systems, pages 1–22, 2019.
JZS15: Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. An empirical exploration of recurrent network architectures. In International Conference on Machine Learning, 2342–2350. 2015.
KFRO12: Maksim Khadkevich, Thomas Fillon, Gaël Richard, and Maurizio Omologo. A probabilistic approach to simultaneous extraction of beats and downbeats. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 445–448. IEEE, 2012.
KB14: D. Kingma and J. Ba. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
KEA06: A.P. Klapuri, A.J. Eronen, and J.T. Astola. Analysis of the meter of acoustic musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 14(1):342–355, 2006. doi:10.1109/TSA.2005.854090.
KFH+15: Peter Knees, Angel Faraldo, Perfecto Herrera, Richard Vogl, Sebastian Böck, Florian Hörschläger, and Mickael Le Goff. Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections. In Proc. of the 16th Intl. Society for Music Information Retrieval Conf. (ISMIR), 364–370. 2015.
KBockW14: Filip Korzeniowski, Sebastian Böck, and Gerhard Widmer. Probabilistic extraction of beat positions from a beat activation function. In ISMIR, 513–518. 2014.
KBockW11: F. Krebs, S. Böck, and G. Widmer. An efficient state space model for joint tempo and meter tracking. In 16th International Society for Music Information Retrieval Conference (ISMIR). 2011.
KBockDW16: F. Krebs, S. Böck, M. Dorfer, and G. Widmer. Downbeat tracking using beat synchronous features with recurrent neural networks. In 17th International Society for Music Information Retrieval Conference (ISMIR). 2016.
KBockW13: Florian Krebs, Sebastian Böck, and Gerhard Widmer. Rhythmic pattern modeling for beat and downbeat tracking in musical audio. In ISMIR, 227–232. 2013.
KHCW15: Florian Krebs, Andre Holzapfel, Ali Taylan Cemgil, and Gerhard Widmer. Inferring metrical structure in music using particle filters. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(5):817–827, 2015.
KH04: Sanjiv Kumar and Martial Hebert. Discriminative fields for modeling spatial dependencies in natural images. In Advances in neural information processing systems, 1531–1538. 2004.
Lon04: J. London. Hearing in Time: Psychological Aspects of Musical Meter. Oxford University Press, New York, USA, 2004.
MBock19: EP MatthewDavies and Sebastian Böck. Temporal convolutional networks for musical audio beat tracking. In 2019 27th European Signal Processing Conference (EUSIPCO), 1–5. IEEE, 2019.
McF18: Brian McFee. Statistical Methods for Scene and Event Classification, pages 103–146. Springer International Publishing, Cham, 2018. URL: https://doi.org/10.1007/978-3-319-63450-0_5, doi:10.1007/978-3-319-63450-0_5.
MB17: Brian McFee and Juan P. Bello. Structured training for large-vocabulary chord recognition. In 18th International Society for Music Information Retrieval Conference, ISMIR. 2017.
MR02: missing journal in murphy2002dynamic
NH10: Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), 807–814. 2010.
NRJB15: L. Nunes, M. Rocamora, L. Jure, and L. W. P. Biscainho. Beat and downbeat tracking based on rhythmic patterns applied to the uruguayan candombe drumming. In 16th Int. Soc. for Music Information Retrieval Conf. (ISMIR), 264–270. Málaga, Spain, October 2015.
PT16: Helene Papadopoulos and George Tzanetakis. Models for music analysis from a markov logic networks perspective. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(1):19–34, 2016.
PP10a: Hélene Papadopoulos and Geoffroy Peeters. Joint estimation of chords and downbeats from an audio signal. IEEE Transactions on Audio, Speech, and Language Processing, 19(1):138–152, 2010.
PHV16: Giambattista Parascandolo, Heikki Huttunen, and Tuomas Virtanen. Recurrent neural networks for polyphonic sound event detection in real life recordings. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6440–6444. IEEE, 2016.
PP10b: Geoffroy Peeters and Helene Papadopoulos. Simultaneous beat and downbeat-tracking using a probabilistic framework: theory and large-scale evaluation. IEEE Transactions on Audio, Speech, and Language Processing, 19(6):1754–1769, 2010.
PBockCD21: António S Pinto, Sebastian Böck, Jaime S Cardoso, and Matthew EP Davies. User-driven fine-tuning for beat tracking. Electronics, 10(13):1518, 2021.
PLV+19: Hendrik Purwins, Bo Li, Tuomas Virtanen, Jan Schlüter, Shuo-Yiin Chang, and Tara Sainath. Deep learning for audio signal processing. IEEE Journal of Selected Topics in Signal Processing, 13(2):206–219, 2019.
Rab89: Lawrence R Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989.
Sch98: Eric D Scheirer. Tempo and beat analysis of acoustic musical signals. The Journal of the Acoustical Society of America, 103(1):588–601, 1998.
SchluterBock14: Jan Schlüter and Sebastian Böck. Improved musical onset detection with convolutional neural networks. In 2014 ieee international conference on acoustics, speech and signal processing (icassp), 6979–6983. IEEE, 2014.
SUM20: H. Schreiber, J. Urbano, and M. & Müller. Music tempo estimation: are we done yet? Transactions of the International Society for Music Information Retrieval, 3(1):111–123, 2020. doi:110.5334/tismir.43.
SP97: Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673–2681, 1997.
Set05: Burr Settles. Abner: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics, 21(14):3191–3192, 2005.
SP03: Fei Sha and Fernando Pereira. Shallow parsing with conditional random fields. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, 134–141. Association for Computational Linguistics, 2003.
SBD16: Siddharth Sigtia, Emmanouil Benetos, and Simon Dixon. An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(5):927–939, 2016.
SjobergL95: Jonas Sjöberg and Lennart Ljung. Overtraining, regularization and searching for a minimum, with application to neural networks. International Journal of Control, 62(6):1391–1407, 1995.
SHCS15: A. Srinivasamurthy, A. Holzapfel, A. T. Cemgil, and X. Serra. Particle filters for efficient meter tracking with dynamic bayesian networks. In 16th Int. Society for Music Information Retrieval Conf. (ISMIR). 2015. URL: http://hdl.handle.net/10230/34998.
SHCS16: A. Srinivasamurthy, A. Holzapfel, A. T. Cemgil, and X. Serra. A generalized bayesian model for tracking long metrical cycles in acoustic music signals. In IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 76–80. Shanghai, China, March 2016.
SHS17: Ajay Srinivasamurthy, Andre Holzapfel, and Xavier Serra. Informed automatic meter analysis of music recordings. In ISMIR-International Conference on Music Information Retrieval. 2017.
SHS14: Ajay Srinivasamurthy, André Holzapfel, and Xavier Serra. In search of automatic rhythm analysis methods for turkish and indian art music. Journal of New Music Research, 43(1):94–114, 2014.
SR21: Christian J Steinmetz and Joshua D Reiss. Wavebeat: end-to-end beat and downbeat tracking in the time domain. arXiv preprint arXiv:2110.01436, 2021.
SM06: C. Sutton and A. McCallum. An introduction to conditional random fields for relational learning. In Lise Getoor and Ben Taskar, editors, Introduction to Statistical Relational Learning, chapter 4, pages 93–128. MIT Press, Cambridge, USA, 2006.
SM12: Charles Sutton and Andrew McCallum. An introduction to conditional random fields. Foundations and Trends® in Machine Learning, 4(4):267–373, 2012.
USchluterG14: Karen Ullrich, Jan Schlüter, and Thomas Grill. Boundary detection in music structure analysis using convolutional neural networks. In ISMIR, 417–422. 2014.
VDWK17: Richard Vogl, Matthias Dorfer, Gerhard Widmer, and Peter Knees. Drum transcription via joint beat and drum modeling using convolutional recurrent neural networks. In ISMIR, 150–157. 2017.
W+90: Paul J Werbos and others. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990.
WCG06: N. Whiteley, A. T. Cemgil, and S. J. Godsill. Bayesian modelling of temporal structure in musical audio. In 7th Int. Society for Music Information Retrieval Conf. (ISMIR). Citeseer, 2006.
ZNY19: missing journal in zahraybeat
ZDGomez14: José R Zapata, Matthew EP Davies, and Emilia Gómez. Multi-feature beat tracking. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4):816–825, 2014.
ZZH+17: Chen Zhu, Yanpeng Zhao, Shuaiyi Huang, Kewei Tu, and Yi Ma. Structured attentions for visual question answering. In Proceedings of the IEEE International Conference on Computer Vision, 1291–1300. 2017.
TheanoDTeam16: Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, May 2016. URL: http://arxiv.org/abs/1605.02688.

Tempo, Beat and Downbeat Estimation

References¶