Instagram Extractor
13th September 2020
A utility for converting my old Instagram posts to content for this site

The Bakes section of this site is mostly intended to be a photo gallery to replace/supplement my Instagram posts, so it felt pretty natural to want to migrate a subsection of those posts over. With years of baking posts under my belt, though, I certainly didn’t want to do it by hand. The good news is that Instagram provides an option to download an archive of all your data from their site (I assume they only do this because of European data laws, but that might be uncharitable of me). When you do this download you end up with a zip file containing a bunch of json files and a few directories with crytpically named images.

Not for nothing that I’m a programmer though, so this seemed like a great opportunity to pop open Visual Studio and write myself a tool to manage the conversion process. Since I’m using Jekyll to create this site really all I needed to do is to convert the existing posts into a bunch of markdown files and put the associated images in the correct directory. Pretty straight forward stuff.

Here’s roughly the format I was going for:

---
layout: bake
title: Habib Bakes
date: 2017-02-28 19:21:00
image: efb27f23e815c6fe9b6eb7edc80d747d.jpg
---
So, basically they're home made Cheez-its. So tasty!

I’ve spent much of the past few years focused on C# (when coding at all) but that was all/mostly in the context of the Unity game engine. It had been a while since I worked with winforms. Thankfully, that kind of thing seems to be a bit like riding a bike so it was pretty easy to get back in the swing of things.

The first order of business was to get the json parsing going, which requires creating a set of simple classes to represent the data format provided by Instagram and installing the right parser dependency. The data layout ends up looking like this:

namespace InstagramPostExtractor {

    public class InstagramPhoto {
        public InstagramPhoto() { }

        public Image photo { get; set; }
        public String caption { get; set; }
        public DateTime taken_at { get; set; }
        public string path { get; set; }
    }

    public class InstagramVideo {
        public String caption { get; set; }
        public DateTime taken_at { get; set; }
        public String path { get; set; }
    }

    public class InstagramProfile {
        public String caption { get; set; }
        public DateTime taken_at { get; set; }
        public bool is_active_profile { get; set; }
        public String path { get; set; }
    }

    public class InstagramData {
        public List<InstagramPhoto> photos { get; set; }
        public List<InstagramVideo> videos { get; set; }
        public List<InstagramProfile> profile { get; set; }
    }
}

Then loading the json was just as simple as

String rawJsonString = File.ReadAllText(textBoxFileName.Text);
InstagramData data = JsonSerializer.Deserialize<InstagramData>(rawJsonString);

Hooking up that loaded data to a DataGrid control was straight forward but loading all those images was an ugly performance hit and I’m kind of alergic to UIs that freeze up, so clearly it was time for some async loading. Since so much of my C# work was within Unity I somehow never got around to playing with the new (well, new to me anyway) async/await features and this seemed the perfect excuse. I ended up trying a few different ways of doing this, but in the end I liked this parallel foreach construction the best:

public async void LoadPhotosAsync() {
    await Task.Run(() => {
        Parallel.ForEach(data.photos, post => {
            String photoPath = Path.Combine(textBoxDirectory.Text, post.path);

            Image fullImage = Image.FromFile(photoPath);
            Image.GetThumbnailImageAbort myCallback = new Image.GetThumbnailImageAbort(ThumbnailCallback);
            post.photo = fullImage.GetThumbnailImage(128, 128, myCallback, IntPtr.Zero);
        });
    });
}

Since the data.photos list is already bound to the DataGrid the rest just kind of took care of itself. After that it was just a bit of UI tweaking, like adding color conding to the rows when an exported post and image are detected in the correct locations and making the post itself editable right from the grid. There were a few idiosyncracies that gave me short bouts of trouble, of course, but that’s hardly a new thing with winforms and once I figured out which knobs to turn the work went quickly.

Instagram Post Extractor UI screenshot

With this tool up and running I was able to export 172 Instagram posts, edited and ready for the site, in under an hour. So not only was this project a fun diversion (which I deeply need during these pandemic and climate fire times) but also absolutely saved me a bunch of time overall.