References

AAB+15

Mart\'ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: large-scale machine learning on heterogeneous systems. 2015. Software available from tensorflow.org. URL: https://www.tensorflow.org/.

ATHJ21

Manuel Anglada-Tort, Peter MC Harrison, and Nori Jacoby. Repp: a robust cross-platform solution for online sensorimotor synchronization experiments. bioRxiv, 2021.

BKK18

Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. CoRR, 2018. URL: http://arxiv.org/abs/1803.01271, arXiv:1803.01271.

Bot91

Léon Bottou. Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes, 91(8):12, 1991.

BockKW14a

S. Böck, F. Krebs, and G. Widmer. A Multi-model Approach to Beat Tracking Considering Heterogeneous Music Styles. In 15th Conf. of the Int. Soc. for Music Information Retrieval (ISMIR 2014), 603–608. Taipei, Taiwan, October 2014.

BockD20

Sebastian Böck and Matthew EP Davies. Deconstruct, analyse, reconstruct: how to improve tempo, beat, and downbeat estimation. Proc. of ISMIR (International Society for Music Information Retrieval). Montreal, Canada, pages 574–582, 2020.

BockDK19

Sebastian Böck, Matthew EP Davies, and Peter Knees. Multi-task learning of tempo and beat: learning one to improve the other. In ISMIR, 486–493. 2019.

BockKW14b

Sebastian Böck, Florian Krebs, and Gerhard Widmer. A multi-model approach to beat tracking considering heterogeneous music styles. In Proc. of the 15th Intl. Society for Music Information Retrieval Conf. (ISMIR), 603–608. Taiwan, Tapei, 2014.

BockS11

Sebastian Böck and Markus Schedl. Enhanced beat tracking with context-aware neural networks. In Proc. Int. Conf. Digital Audio Effects, 135–139. 2011.

BockKW16

S. Böck, F. Krebs, and G. Widmer. Joint beat and downbeat tracking with recurrent neural networks. In 17th International Society for Music Information Retrieval Conference (ISMIR). 2016.

CFG18

Tian Cheng, Satoru Fukayama, and Masataka Goto. Convolving gaussian kernels for rnn-based beat tracking. In 2018 26th European Signal Processing Conference (EUSIPCO), 1905–1909. IEEE, 2018.

CVMerrienboerG+14

K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.

CFCS17

Keunwoo Choi, György Fazekas, Kyunghyun Cho, and Mark Sandler. A tutorial on deep learning for music information retrieval. arXiv preprint arXiv:1709.04396, 2017.

C+15

François Chollet and others. Keras. https://keras.io, 2015.

DDP09

Matthew EP Davies, Norberto Degara, and Mark D Plumbley. Evaluation methods for musical audio beat tracking algorithms. Queen Mary University of London, Centre for Digital Music, Tech. Rep. C4DM-TR-09-06, 2009.

DRuaP+12

Norberto Degara, Enrique Argones Rúa, Antonio Pena, Soledad Torres-Guijarro, Matthew EP Davies, and Mark D Plumbley. Reliability-informed beat tracking of musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 20(1):290–301, 2012.

DSdHMuller19

Jonathan Driedger, Hendrik Schreiber, W. Bas de Haas, and Meinard Müller. Towards automatically correcting tapped beat annotations for music recordings. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, November 2019.

DBDR15

S. Durand, J. P. Bello, B. David, and G. Richard. Downbeat tracking with multiple features and deep neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume. 2015. doi:10.1109/ICASSP.2015.7178001.

DBDR16

S. Durand, J. P. Bello, B. David, and G. Richard. Feature adapted convolutional neural networks for downbeat tracking. In IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP). 2016.

DBDR17

S. Durand, J. P. Bello, B. David, and G. Richard. Robust Downbeat Tracking Using an Ensemble of Convolutional Networks. IEEE/ACM Trans. on Audio, Speech, and Language Processing, 25(1):76–89, January 2017.

DE16

S. Durand and S. Essid. Downbeat Detection With Conditional Random Fields And Deep Learned Features. In 17th Int. Soc. for Music Information Retrieval Conf. (ISMIR 2016), 386–392. New York, USA, August 2016.

DDR14

Simon Durand, Bertrand David, and Gaël Richard. Enhancing downbeat detection when facing different music styles. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3132–3136. IEEE, 2014.

Ell07

Daniel PW Ellis. Beat tracking by dynamic programming. Journal of New Music Research, 36(1):51–60, 2007.

FJDE15

T. Fillon, C. Joder, S. Durand, and S. Essid. A conditional random field system for beat tracking. In IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 424–428. South Brisbane, Australia, April 2015.

FMR+19

M. Fuentes, L. S. Maia, M. Rocamora, L. W. P. Biscainho, H. C. Crayencour, S. Essid, and J. P. Bello. Tracking beats and microtiming in afro-latin american music using conditional random fields and deep learning. In 20th International Society for Music Information Retrieval Conference, ISMIR. 2019.

FMC+19

M. Fuentes, B. McFee, H.C. Crayencour, S. Essid, and J.P. Bello. A music structure informed downbeat tracking system using skip-chain conditional random fields and deep learning. In 44th Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 481–485. Brighton, UK, May 2019.

FMC+18

Magdalena Fuentes, Brian McFee, Hélène Crayencour, Slim Essid, and Juan Bello. Analysis of common design choices in deep learning systems for downbeat tracking. In The 19th International Society for Music Information Retrieval Conference. 2018.

GBC16

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.

Got01

Masataka Goto. An audio-based real-time beat tracking system for music with or without drum-sounds. Journal of New Music Research, 30(2):159–171, 2001.

GTH19

Alexander Greaves-Tunnell and Zaid Harchaoui. A statistical investigation of long memory in language and music. arXiv preprint arXiv:1904.03834, 2019.

GSKoutnik+16

Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. Lstm: a search space odyssey. IEEE transactions on neural networks and learning systems, 28(10):2222–2232, 2016.

HM04

Stephen W Hainsworth and Malcolm D Macleod. Particle filtering applied to musical tempo tracking. EURASIP Journal on Advances in Signal Processing, 2004(15):927847, 2004.

HZCPerpinan04

Xuming He, Richard S Zemel, and Miguel Á Carreira-Perpiñán. Multiscale conditional random fields for image labeling. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 2, II–II. IEEE, 2004.

HS97

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.

HDF12

Jason Hockman, Matthew EP Davies, and Ichiro Fujinaga. One in the jungle: downbeat detection in hardcore, jungle, and drum and bass. In ISMIR, 169–174. 2012.

HKS14

A. Holzapfel, F. Krebs, and A. Srinivasamurthy. Tracking the 'odd': meter inference in a culturally diverse music corpus. In 15th Int. Society for Music Information Retrieval Conf. (ISMIR), 425–430. Taipei, Taiwan, October 2014.

HG16

Andre Holzapfel and Thomas Grill. Bayesian meter tracking on learned signal representations. In ISMIR-International Conference on Music Information Retrieval, 262–268. ISMIR, 2016.

HDZ+12

André Holzapfel, Matthew E. P. Davies, José R. Zapata, João Lobato Oliveira, and Fabien Gouyon. Selective sampling for beat tracking evaluation. IEEE Transactions on Audio, Speech, and Language Processing, 20(9):2539–2548, 2012. doi:10.1109/TASL.2012.2205244.

IS15

S. Ioffe and C. Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift. In 32nd International Conference on Machine Learning (ICML). 2015.

JLL19

Bijue Jia, Jiancheng Lv, and Dayiheng Liu. Deep learning-based automatic downbeat tracking: a brief review. Multimedia Systems, pages 1–22, 2019.

JZS15

Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. An empirical exploration of recurrent network architectures. In International Conference on Machine Learning, 2342–2350. 2015.

KFRO12

Maksim Khadkevich, Thomas Fillon, Gaël Richard, and Maurizio Omologo. A probabilistic approach to simultaneous extraction of beats and downbeats. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 445–448. IEEE, 2012.

KB14

D. Kingma and J. Ba. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

KEA06

A.P. Klapuri, A.J. Eronen, and J.T. Astola. Analysis of the meter of acoustic musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 14(1):342–355, 2006. doi:10.1109/TSA.2005.854090.

KFH+15

Peter Knees, Angel Faraldo, Perfecto Herrera, Richard Vogl, Sebastian Böck, Florian Hörschläger, and Mickael Le Goff. Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections. In Proc. of the 16th Intl. Society for Music Information Retrieval Conf. (ISMIR), 364–370. 2015.

KBockW14

Filip Korzeniowski, Sebastian Böck, and Gerhard Widmer. Probabilistic extraction of beat positions from a beat activation function. In ISMIR, 513–518. 2014.

KBockW11

F. Krebs, S. Böck, and G. Widmer. An efficient state space model for joint tempo and meter tracking. In 16th International Society for Music Information Retrieval Conference (ISMIR). 2011.

KBockDW16

F. Krebs, S. Böck, M. Dorfer, and G. Widmer. Downbeat tracking using beat synchronous features with recurrent neural networks. In 17th International Society for Music Information Retrieval Conference (ISMIR). 2016.

KBockW13

Florian Krebs, Sebastian Böck, and Gerhard Widmer. Rhythmic pattern modeling for beat and downbeat tracking in musical audio. In ISMIR, 227–232. 2013.

KHCW15

Florian Krebs, Andre Holzapfel, Ali Taylan Cemgil, and Gerhard Widmer. Inferring metrical structure in music using particle filters. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(5):817–827, 2015.

KH04

Sanjiv Kumar and Martial Hebert. Discriminative fields for modeling spatial dependencies in natural images. In Advances in neural information processing systems, 1531–1538. 2004.

Lon04

J. London. Hearing in Time: Psychological Aspects of Musical Meter. Oxford University Press, New York, USA, 2004.

MBock19

EP MatthewDavies and Sebastian Böck. Temporal convolutional networks for musical audio beat tracking. In 2019 27th European Signal Processing Conference (EUSIPCO), 1–5. IEEE, 2019.

McF18

Brian McFee. Statistical Methods for Scene and Event Classification, pages 103–146. Springer International Publishing, Cham, 2018. URL: https://doi.org/10.1007/978-3-319-63450-0_5, doi:10.1007/978-3-319-63450-0_5.

MB17

Brian McFee and Juan P. Bello. Structured training for large-vocabulary chord recognition. In 18th International Society for Music Information Retrieval Conference, ISMIR. 2017.

MR02

missing journal in murphy2002dynamic

NH10

Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), 807–814. 2010.

NRJB15

L. Nunes, M. Rocamora, L. Jure, and L. W. P. Biscainho. Beat and downbeat tracking based on rhythmic patterns applied to the uruguayan candombe drumming. In 16th Int. Soc. for Music Information Retrieval Conf. (ISMIR), 264–270. Málaga, Spain, October 2015.

PT16

Helene Papadopoulos and George Tzanetakis. Models for music analysis from a markov logic networks perspective. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(1):19–34, 2016.

PP10a

Hélene Papadopoulos and Geoffroy Peeters. Joint estimation of chords and downbeats from an audio signal. IEEE Transactions on Audio, Speech, and Language Processing, 19(1):138–152, 2010.

PHV16

Giambattista Parascandolo, Heikki Huttunen, and Tuomas Virtanen. Recurrent neural networks for polyphonic sound event detection in real life recordings. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6440–6444. IEEE, 2016.

PP10b

Geoffroy Peeters and Helene Papadopoulos. Simultaneous beat and downbeat-tracking using a probabilistic framework: theory and large-scale evaluation. IEEE Transactions on Audio, Speech, and Language Processing, 19(6):1754–1769, 2010.

PBockCD21

António S Pinto, Sebastian Böck, Jaime S Cardoso, and Matthew EP Davies. User-driven fine-tuning for beat tracking. Electronics, 10(13):1518, 2021.

PLV+19

Hendrik Purwins, Bo Li, Tuomas Virtanen, Jan Schlüter, Shuo-Yiin Chang, and Tara Sainath. Deep learning for audio signal processing. IEEE Journal of Selected Topics in Signal Processing, 13(2):206–219, 2019.

Rab89

Lawrence R Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989.

Sch98

Eric D Scheirer. Tempo and beat analysis of acoustic musical signals. The Journal of the Acoustical Society of America, 103(1):588–601, 1998.

SchluterBock14

Jan Schlüter and Sebastian Böck. Improved musical onset detection with convolutional neural networks. In 2014 ieee international conference on acoustics, speech and signal processing (icassp), 6979–6983. IEEE, 2014.

SUM20

H. Schreiber, J. Urbano, and M. & Müller. Music tempo estimation: are we done yet? Transactions of the International Society for Music Information Retrieval, 3(1):111–123, 2020. doi:110.5334/tismir.43.

SP97

Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673–2681, 1997.

Set05

Burr Settles. Abner: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics, 21(14):3191–3192, 2005.

SP03

Fei Sha and Fernando Pereira. Shallow parsing with conditional random fields. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, 134–141. Association for Computational Linguistics, 2003.

SBD16

Siddharth Sigtia, Emmanouil Benetos, and Simon Dixon. An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(5):927–939, 2016.

SjobergL95

Jonas Sjöberg and Lennart Ljung. Overtraining, regularization and searching for a minimum, with application to neural networks. International Journal of Control, 62(6):1391–1407, 1995.

SHCS15

A. Srinivasamurthy, A. Holzapfel, A. T. Cemgil, and X. Serra. Particle filters for efficient meter tracking with dynamic bayesian networks. In 16th Int. Society for Music Information Retrieval Conf. (ISMIR). 2015. URL: http://hdl.handle.net/10230/34998.

SHCS16

A. Srinivasamurthy, A. Holzapfel, A. T. Cemgil, and X. Serra. A generalized bayesian model for tracking long metrical cycles in acoustic music signals. In IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 76–80. Shanghai, China, March 2016.

SHS17

Ajay Srinivasamurthy, Andre Holzapfel, and Xavier Serra. Informed automatic meter analysis of music recordings. In ISMIR-International Conference on Music Information Retrieval. 2017.

SHS14

Ajay Srinivasamurthy, André Holzapfel, and Xavier Serra. In search of automatic rhythm analysis methods for turkish and indian art music. Journal of New Music Research, 43(1):94–114, 2014.

SR21

Christian J Steinmetz and Joshua D Reiss. Wavebeat: end-to-end beat and downbeat tracking in the time domain. arXiv preprint arXiv:2110.01436, 2021.

SM06

C. Sutton and A. McCallum. An introduction to conditional random fields for relational learning. In Lise Getoor and Ben Taskar, editors, Introduction to Statistical Relational Learning, chapter 4, pages 93–128. MIT Press, Cambridge, USA, 2006.

SM12

Charles Sutton and Andrew McCallum. An introduction to conditional random fields. Foundations and Trends® in Machine Learning, 4(4):267–373, 2012.

USchluterG14

Karen Ullrich, Jan Schlüter, and Thomas Grill. Boundary detection in music structure analysis using convolutional neural networks. In ISMIR, 417–422. 2014.

VDWK17

Richard Vogl, Matthias Dorfer, Gerhard Widmer, and Peter Knees. Drum transcription via joint beat and drum modeling using convolutional recurrent neural networks. In ISMIR, 150–157. 2017.

W+90

Paul J Werbos and others. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990.

WCG06

N. Whiteley, A. T. Cemgil, and S. J. Godsill. Bayesian modelling of temporal structure in musical audio. In 7th Int. Society for Music Information Retrieval Conf. (ISMIR). Citeseer, 2006.

ZNY19

missing journal in zahraybeat

ZDGomez14

José R Zapata, Matthew EP Davies, and Emilia Gómez. Multi-feature beat tracking. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4):816–825, 2014.

ZZH+17

Chen Zhu, Yanpeng Zhao, Shuaiyi Huang, Kewei Tu, and Yi Ma. Structured attentions for visual question answering. In Proceedings of the IEEE International Conference on Computer Vision, 1291–1300. 2017.

TheanoDTeam16

Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, May 2016. URL: http://arxiv.org/abs/1605.02688.