BigDataFr recommends: Regular Expressions for Data Scientists
[…] As data scientists, diving headlong into huge heaps of data is part of the mission. Sometimes, this includes massive corpuses of text. For instance, suppose we were asked to figure out who’s been emailing whom in the scandal of the Panama Papers — we’d be sifting through 11.5 million documents! We could do that manually and read every last email ourselves, or we could leverage the power of Python. After all, a vital raison d’être of code is to automate tasks.
Even so, coding up a script from scratch requires a lot of time and energy. This is where regular expressions come in. Also known as RE, regex, and regular patterns, they form a compact language that allows us to sort through and analyze text in a jiffy. Regex began in 1956, when Stephen Cole Kleene created it as a notation to describe the McCulloch and Pitts model of the human nervous system. In the 1960s, Ken Thompson added the notation to a text editor similar to Notepad for Windows, and regex has grown to prominence since then.[…]