{ "cells": [ { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "[](https://mybinder.org/v2/git/https%3A%2F%2Fgitlab.dsi.universite-paris-saclay.fr%2Fbruno.denis%2Fintro_jupyter/HEAD?labpath=notebooks%2Fcovid_19.ipynb)\n", "[](https://nbviewer.org/urls/gitlab.dsi.universite-paris-saclay.fr/bruno.denis/intro_jupyter/-/raw/main/notebooks/covid_19.ipynb)\n", "\n", "# Analyse des cas de COVID-19\n", "\n", "Ce carnet Jupyter, analyse de données ouvertes disponibles en ligne mises à jour quotidiennement. Il identifie les 10 pays ayant le plus grand nombre total de cas de COVID-19 déclarés depuis 2020.\n", "\n", "- Les données sont lues sur un site de partage de données ouvertes (mise à jour quotidienne).\n", "- Les données relatives à des emplacements qui ne sont pas des pays sont supprimées.\n", "- Les données restantes sont analysées, elles sont regroupées par emplacement (`location`), sommées et triées par ordre décroissant de leur somme. \n", "\n", "Ensuite les 10 premiers pays en nombre total de cas sont affichés sous la forme d'une table, pour sous la forme d'un histogramme." ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "Les fonctionnalités majeures de ce carnet sont :\n", "\n", "- **Reproductivité** par utilisation de données en ligne au format CSV avec le module `panda`\n", "- **Affichage tabulaire et graphique** par utilisation du module `pandas`" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "outputs": [], "source": [ "import pandas\n", "import datetime" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Lecture des données depuis le site \"*Our World in Data*\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# Load the dataset from open data source\n", "data_set_url = \"https://covid.ourworldindata.org/data/owid-covid-data.csv\"\n", "covid_dataset = pandas.read_csv(data_set_url)" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Suppression des données non liées à un pays" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# Name of location column that is not a country\n", "not_country_location = [\n", " \"World\",\n", " \"Asia\", \n", " \"Europe\",\n", " \"North America\",\n", " \"South America\",\n", " \"European Union\",\n", " \"Africa\",\n", " \"Oceania\",\n", " \"Upper middle income\",\n", " \"High income\",\n", " \"Lower middle income\",\n", "]\n", "# Drop rows with location not a country\n", "covid_dataset = covid_dataset[\n", " ~covid_dataset['location'].isin(not_country_location)\n", "]\n", "# set location column as index\n", "covid_dataset = covid_dataset.set_index('location')" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Analyse des données" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "world_covid_cases = covid_dataset.filter(\n", " [\"total_cases\"]\n", ").groupby(\"location\").max().sort_values(\"total_cases\", ascending=False)" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Affichage tabulaire du résultat de l'analyse" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analyse du 2024-05-31\n" ] }, { "data": { "text/html": [ "
| \n", " | total_cases | \n", "
|---|---|
| location | \n", "\n", " |
| United States | \n", "103436829.0 | \n", "
| China | \n", "99356481.0 | \n", "
| India | \n", "45038518.0 | \n", "
| France | \n", "38997490.0 | \n", "
| Germany | \n", "38437756.0 | \n", "
| Brazil | \n", "37519960.0 | \n", "
| South Korea | \n", "34571873.0 | \n", "
| Japan | \n", "33803572.0 | \n", "
| Italy | \n", "26722507.0 | \n", "
| United Kingdom | \n", "24927820.0 | \n", "