This week's newsletter is dedicated to coding. Coding for research, coding for vizualisation, and coding for fun!
Make Pandas easier
We use Pandas (a Python module for data analysis) a lot, and we have largely replaced Excel with Pandas code. But Pandas is far from easy to learn, and one of the reasons is that it suffers heavily from “feature bloat”. In trying to cater for all possible needs, it has had to implement a lot of features. That's why we agree with almost everything Ted Petrou writes in his Medium post from last week: Minimally Sufficient Pandas https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428
The Python equivalent to the ggplot2 R package would be Matplotlib. That too, is highly customizable, once you find out how to work it. We use it for our news service Newsworthy. Check it in action in the recently published reports on crime development in Swedish municipalities: http://newsworthy.se/sv/report/crime/
We use our own wrapper around Matplotlib, to make it easier to create charts in a uniform fashion. That library is open source, available here: https://github.com/jplusplus/newsworthycharts
Swedish language analysis
No matter what language you code in; If you ever do text analysis in Swedish, you'll want to bookmark Peter Dahlgren's collection of Swedish language data. Here you'll find thing like lemma dictionaries (for finding the right word stem (more or less)), stop words (for sorting out things like prepositions), and all the other stuff that's so easy to find for English, but yet so scarce for smaller languages. https://github.com/peterdalle/svensktext