Imapsync, OAuth2, Google, Office365

We need to migrate a whole bunch of email accounts from Google Workspace to Office 365, and we’re not the administrators for either of the tenants so we don’t have access to any of the tools which make that relatively simple. The administrators have expressed no interest in making those tools available to us as a paid service, either, so we’re pretty much on our own. Here’s my plan (currently still undergoing some revision as the folder mapping isn’t working quite the way I expected):

Links:

TInyproxy isn’t strictly necessary, but I’m hoping it’ll make things just a little easier for my users.

Configuration:

For the emailproxy config, you’ll need a section for each server you want to proxy. It has pretty good documentation, so I’ll be light here, but basically each server section is named [PROTOCOL-LOCALPORT] with a remote server address and port specified:

[IMAP-1993]
server_address = outlook.office365.com
server_port = 993

[IMAP-2993]
server_address = imap.gmail.com
server_port = 993

You’ll also need a section for each user account. This bit’s going to be specific to the UW, since it requires the use of our tenant ID and our particular account ID format.

[their-netid@uw.edu]
permission_url = https://login.microsoftonline.com/f6b6dd5b-f02f-441a-99a0-162ac5060bd2/oauth2/v2.0/authorize
token_url = https://login.microsoftonline.com/f6b6dd5b-f02f-441a-99a0-162ac5060bd2/oauth2/v2.0/token
oauth2_scope = https://outlook.office365.com/IMAP.AccessAsUser.All https://outlook.office365.com/POP.AccessAsUser.All https://outlook.office365.com/SMTP.Send offline_access
redirect_uri = http://localhost:8080
client_id = 08162f7c-0fd2-4200-a84a-f25a4db0b584
client_secret = TxRBilcHdC6WGBee]fs?QR:SJ8nI[g82

[their-netid@gamail.uw.edu]
permission_url = https://accounts.google.com/o/oauth2/auth
token_url = https://oauth2.googleapis.com/token
oauth2_scope = https://mail.google.com/
redirect_uri = http://localhost:8081
client_id = 406964657835-aq8lmia8j95dhl1a2bvharmfk3t1hgqj.apps.googleusercontent.com
client_secret = kSmqreRr0qwBWJgbf5Y-PjSU

A few explanations about the values there:

  • Each email address needs to be unique. So despite the email address for both accounts being “their-netid@uw.edu”, I have to make at least one of them different. I know that Google Workspace accepts “their-netid@gamail.uw.edu” as a username, and I assume that Office 365 has an equivalent alternate format, but I don’t know what it is.
  • The client ID and client secrets here are from Mozilla Thunderbird, because reusing someone else’s existing application registration is a lot easier than registering your own. This will mean that your users get a dialog saying that Thunderbird wants access to their account, so you’ll have to explain that.
  • The guid beginning “f6b6dd5b” is the UW’s Office 365 tenant ID. You may be able to use “common”, or you might have to use your own tenant ID.
  • You’ll need a pair like this for every user you want to migrate, preferably all in advance, because the proxy server doesn’t handle reloading its configuration on the fly well. At least it doesn’t when running in a non-gui mode; it might work better if launched in gui mode. In any case, when you add an account stanza to the file, the server will write the authentication tokens it gets to that stanza during operation, so you can’t easily edit the config file and add another user while the server is still running. Since email migration could take days for any given user, this could be a problem.
  • The documentation says that you should use a different redirect URL for each account. I don’t think this is necessary as long as you only have one person authenticating at a time — you should be able to get away with one one pair of URLs per user. It closes the server listening on that redirect URL whenever it’s not actively waiting for a response, so if you’re going one by one there shouldn’t be any conflicts. (But it does try to open the second account’s response listener before it’s finished shutting down the first, so you do need different ports for each service for a given user.) Also, you have to use “localhost”, not a public URL — the application registration includes that and if you try to change it you’ll get an authentication error because yours doesn’t match Thunderbird’s. If you want to register your own app, you could do this directly without using tinyproxy, but then you probably would want to change the port for every account.

For tinyproxy, I’m just going to be doing a simple proxy from my external IP on port 8080 to localhost:8080 and the equivalent on port 8081:

Port 8080
Listen 10.158.XX.YY
Timeout 600
Allow 10.0.0.0/8
ReversePath "/" "http://127.0.0.1:8080/"
ReverseOnly Yes
Port 8081
Listen 10.158.XX.YY
Timeout 600
Allow 10.0.0.0/8
ReversePath "/" "http://127.0.0.1:8081/"
ReverseOnly Yes

(NB: 10.158… is the “external” IP of the machine I’m running this on, accessible within the UW network, and I’m allowing access from 10.0.0.0/8 because that covers the entire building and also people connected to the VPN. I’ll need to adjust that if I’ve got people in other parts of campus not on the 10 net.)

I do need two ports to avoid the second one trying to open before the first is closed. But it’s still only two ports total, so make two config files, one for port 8080 and one for 8081. As long as you don’t have two users trying to authenticate at once, you shouldn’t run into conflicts.

For imapsync, I’ve got a shell script which looks like this:

#!/usr/bin/env bash
imapsync \
    --host1 localhost \
    --port1 2993 \
    --no-ssl1 \
    --user1 $1@gamail.uw.edu \
    --password1 'NotRealPassword' \
    --host2 localhost \
    --port2 1993 \
    --no-ssl2 \
    --user2 $1@uw.edu \
    --password2 'NotRealPassword' \
    --gmail1 \
    --maxsize 45000000 \
    --maxmessagespersecond 4 \
    --regexflag 's/\\Flagged//g' \
    --disarmreadreceipts \
    --regexmess 's,(.{10500}),$1\r\n,g' \
    --maxbytespersecond 40000 \
    --maxbytesafter 3000000000 \
    --useheader="X-Gmail-Received" \
    --useheader "Message-Id" \
    --regextrans2 "s,\[Gmail\].,," \
    --regextrans2 's,^INBOX/(.+),$1,' \
    --f1f2 '[Gmail]/All Mail'='Archived in Gmail' \
    --folderfirst '\[Gmail\]/Starred' \
    --folderlast "INBOX" \
    --folderlast "[Gmail]/All Mail" \
    --exclude '\[Gmail\]/Important' \
    --exclude '\[Gmail\]/Spam' \
    --exclude '\[Gmail\]/Trash' \
    --exclude '\[Gmail\]/Snoozed' \
    --skipcrossduplicates \
    --automap \
    --noexpunge

I think this is how I want it to work. I don’t know what “Snoozed” is, but I know I don’t want to copy spam and trash over, and since “Important” is auto-tagged I don’t want to copy that either, since all the messages in it will also be somewhere else. So what this should do is copy all the starred messages first, into a folder named “Starred” (I’ll have to ask the user if this is the behavior they prefer), then all other folders/tags in whatever internal order imapsync uses (which may cause problems for people who assign multiple labels to messages in gmail), then the Inbox (not duplicating anything that had already been found in a folder), and then everything else (again not duplicating anything already seen) into a folder named “Archived in Gmail” because if it finds any non-duplicate messages in there, they should all be messages which have no labels, meaning they were archived from the inbox. (As far as I can tell, “Archive” in Gmail just means “remove the Inbox label”.)

These options were found through some trial and error, and there are almost certainly some which don’t do anything. I think a bunch of them are made obsolete by the –gmail1 and –gmail2 arguments, for example. I’ll try to clean this up once I find out what’s really necessary.

NB: “NotRealPassword” in the password arguments there can literally be that string or anything else. This password will be used to encrypt the auth tokens stored in the email proxy’s config file, but it won’t be used for any part of the actual email service authentication. Don’t change the password between runs, though, or you’ll need to perform the whole auth process again.

Running the proxies

I’m going to run each process in its own screen(1) window, and I don’t know how to set an environment variable for a command when launching it inside a new screen window, so I have to make a small helper script to do that:

#!/usr/bin/env bash
PYSTRAY_BACKEND=dummy python emailproxy.py --no-gui --local-server-auth --config-file emailproxy.config

The PYSTRAY_BACKEND needs to be there in order for the email proxy to run without a gui. And now that I think of it, I think there was a dependency missing after installing the python requirements; I think I had to install gir1.2-ayatanaappindicator3-0.1 and/or gir1.2-appindicator3-0.1 with apt before it would run at all. This is probably all easier if you’re using a gui. Oh well.

Now launch both the proxy servers:

screen -t 8080 tinyproxy -d -c tinyproxy-8080.conf
screen -t 8081 tinyproxy -d -c tinyproxy-8081.conf
screen -t authproxy ./launch.sh

NB: you definitely need access to the output of the email proxy, because it’s going to be giving you URLS for the users to visit for authentication and authorization.

Running imapsync

Get your user on a zoom call or in a text chat, something where you can send them a long URL to click on. Have them open Gmail, go to their settings, and make sure that IMAP is enabled. (Also, well before this would be a good time to tell them to organize their mail and delete old messages they don’t need any more.)

Now run the imapsync script with their netid as the argument. For example: ./migrate.sh burkefax Over in the window with the output of the OAuth proxy, look for a URL it wants you to open:

2022-08-17 17:16:18: Email OAuth 2.0 Proxy Local server auth mode: please authorise a request for account burkefax@gamail.uw.edu
2022-08-17 17:16:18: Please visit the following URL to authenticate account burkefax@gamail.uw.edu: https://accounts.google.com/o/oauth2/auth?client_id=406964657835-aq8lmia8j95dhl1a2bvharmfk3t1hgqj.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8081&scope=https%3A%2F%2Fmail.google.com%2F&response_type=code&access_type=offline&login_hint=burkefax%40gamail.uw.edu

Give that URL to the user and tell them to open it in a browser, preferably the one they just opened Gmail in. That’ll skip them having to log in to Google again, and it’s why I don’t have a screenshot of that step here. Once they’ve logged in (or if they were already logged in) they’ll get an access request dialog:

Explain to them that while this is labeled Mozilla Thunderbird, it will give your email migration tool access to connect to their Gmail account to get messages to transfer over. Remind them that if anyone but you asks them to do something like this, they absolutely should not do it, because they will be giving that person access to their email account.

Once they’ve clicked “Allow”, they’ll be redirected to a URL beginning with http://localhost:8081/, which should fail unless they’ve got a server running on their own desktop and listening there:

Once that’s happened, tell them to replace “localhost” in the URL with the name or IP of the machine where you’re running the OAuth proxy. Be sure they keep the port number and everything else, though. Again, maybe let them know that if anyone else tries to walk them through a process like this, they are almost certainly being scammed.

Once they’ve done that and pressed enter to g to the new URL, they should see a message like this:

Email OAuth 2.0 Proxy successfully authenticated account their-netid@gamail.uw.edu. You can now close this window.

Go back to the window running the OAuth proxy and look for another URL. This one will be for Office 365. Give them that URL and have them log in and approve the permissions request again:

They’ll be sent to a localhost:8080 URL; have them replace “localhost” with your machine’s name or IP again and press enter to go there. They should get the successful authentication message again.

Depending on the timing, your imapsync script now should either start chugging along and copying their mail over, or it might fail. If it fails, run it again; now that the user has logged in through the OAuth proxy, it’s stored their refresh token and should be able to get new access tokens on its own when required. When the refresh token expires, probably in a couple of weeks, they’ll need to go through the authorization steps again to grant a new one if further imapsync runs are required.

At the end of the imapsync run, it will put a message in both accounts’ inboxes saying that it’s finished. You can stop that behavior by adding --noemailreport1 and --noemailreport2 to the imapsync options. (As with many other imapsync options, an switch ending with “1” refers to the “from” account and “2” refers to the “to” account.)

This process has some but not a lot of testing. I’ll be testing it more heavily over the next couple of weeks with real users, and I’ll come back and update things if any changes were required.

Leave a Reply

Your email address will not be published. Required fields are marked *