Software engineering, problem solving, nerd stuff

Running a Spring Boot JAR service with SELinux enabled

Just a quick reminder how to run a Spring Boot JAR (or any other self JAR executable) with SELinux enabled:

chcon --type=java_exec_t /opt/myapp/spring-boot-app.jar

To make this persistent you have to use the bin_exec_t type as java_exec_t is just an alias:

# apply the bin_exec_t
semanage fcontext -a -t bin_exec_t /opt/myapp/spring-boot-app.jar
# restore SELinux contexts
restorecon -R /opt/myapp

ll -Z /opt/myapp
# should look like
# -rwxr-xr-x. 1 myapp myapp unconfined_u:object_r:bin_t:s0 26500195 Aug 28 08:34 myapp.jar

To let systemd start this service, you have to create a systemd unit file at /etc/systemd/system/myapp.service:

[Unit]
Description=My Spring Boot application
After=syslog.target network.target

[Service]
ExecStart=/opt/myapp/spring-boot-app.jar
EnvironmentFile=-/etc/sysconfig/myapp
SuccessExitStatus=143
User=pwss

[Install]
WantedBy=multi-user.target

And don’t forget to add the service user, reload the systemd services and enable the myapp.service.

Using IPv6 with AWS Application Load Balancer (ALB)

Today I struggled an hour or so to access a AWS hosted web application through IPv6. Just follow these rules:

  • Get an IPv6 CIDR for your VPC: Go to VPC > Your VPCs > ${YOUR_VPC} > Edit CIDRs > Add IPv6 CIDR. The IPv6 CIDR is automatically choosen by AWS. You can’t configure the IPv6 CIDR on your own.
  • For the subnet(s) your ALB is located in, you have to allocate an IPv6 subnet from your previously generated IPv6 subnet. Go to VPC > Subnets > ${YOUR_ALB_SUBNETS} > Edit IPv6 CIDRs > Add IPv6 CIDR. You can have 255 IPv6 subnets.
  • You have to add any IPv6 destination to your routing table. In VPC > Route Tables > ${YOUR_ROUTING_TABLE} > Routes > Edit add “Destination=::/0” and “Target=${YOUR_IGW_ID}” as routing table entry. This was, BTW, the part I had forgotten.
  • Enable dualstack for your ALB. Go to EC2 > Load Balancers > ${YOUR_APPLICATION_LOAD_BALANCER} > Edit IP address type and select dualstack. The option is only available if your subnets have been previously configured with IPv6 CIDRs.
  • Your load balancer’s security group must allow HTTP and/or HTTPS traffic through IPv6. Go to EC2 > Security Groups > ${YOUR_APPLICATION_LOAD_BALANCERS_SECURITY_GROUP} and add the inbound and outbound rules “Protocol=TCP,Port Range=80, Source=::/0” and/or “Protocol=TCP,Port Range=443, Source|Destination=::/0”.

As soon as you have enabled dualstack mode, for the ALB, AWS propagates a new AAAA DNS record for the load balancer. This takes a few minutes. You can access the load balancer with IPv6 by using the load balancer’s IPv4 CNAME. The load balancer itself forwards HTTP requests to the backend servers over IPv4. EC2 instances do not need a IPv4 address on their own.

How to programtically insert versionized initial data into Spring Boot applications

One of the common required tasks for an application using a persistence store is to initialize the underlying database with basic data sets. Most of the time this contains something like admin users or default roles.

Setting the stage

To give a proper example, we have the database table role with two columns id (primary key) as an internal ID and uuid (primary key) as an external key.
In Liquibase, our changeset for this table has the following definition:

	<changeSet author="schakko" id="schema-core">
		<createTable tableName="role">
			<column name="id" type="BIGSERIAL" autoIncrement="true">
				<constraints nullable="false" primaryKey="true" unique="true"
					uniqueConstraintName="unq_role_id" />
			</column>
			<column name="uuid" type="UUID">
				<constraints nullable="false" primaryKey="true" unique="true"
					uniqueConstraintName="unq_role_uuid" />
			</column>
			<column name="name" type="varchar(255)">
				<constraints nullable="false" unique="true" />
			</column>
		</createTable>
	</changeSet>

My requirements are:

  • I want to add multiple custom roles into this table
  • The uuid field must be randomly generated
  • The schema definition must work on H2 and PostgreSQL without the uuid-ossp module. Our application backend is responsible for the generation of UUIDs.

Initializing databases with Spring Boot’s native features

With Java, specifically Spring Boot, there are two ways to initialize the database:

  1. Hibernate, and therefore Spring Boot with JPA, checks for a file named import.sql in the root of the classpath. This file is executed on startup when Hibernate creates the schema.
  2. The file data.sql, respectively data-${platform}.sql for concrete DBMS’, are used for importing SQL data by using the pure JDBC datasource without using any JPA stuff.

For simple tasks, both options are feasible. But in our case it can’t fulfil the requirements: A common SQL UUID generator function like generate_uuid() does not exist and probably won’t ever be standardized in SQL. So we need two separate data.sql files, one for each database management system. In addition to that, we still don’t have access to the OSSP module for generating a UUID in PostgreSQL.

Inserting data programtically

Why not using a simple ApplicationListener to generate the roles during the startup of the Spring framework?

@RequiredArgsConstructor
@Component
@Order(Ordered.HIGHEST_PRECEDENCE)
public class InsertRoleStamdata implements ApplicationListener<ApplicationReadyEvent> {
	@NonNull
	private final RoleRepository roleRepository;

	public void onApplicationEvent(ApplicationReadyEvent event) {
		if (roleRepository.count() > 0) {
			return;
		}

		roleRepository.save(new Role("ADMIN", java.util.UUID.randomUUID()));
	}
}

This does obviously work and is executed on every application’s startup. With the if condition, we ensure that we only insert a role if there is no role present yet.
But what happens if the role ADMIN has to be renamed to ADMINISTRATOR? If you think about it, the code above can rapidly change into some ugly monster with various condition checkings and edge cases. In the case you want to refactor it to split a migration into different classes, you have to retain the order of the executed listener and so on.
And besides of this, we need some traceable versionining.

Using a schema migration tool

For obvious reasons, a schema migration tool like Liquibase or Flyway should be the way to go. But how can it fulfil our requirements?

In Liquibase we can define a changeset which uses the insert tag:

    <changeSet author="schakko" id="role-stamdata">
        <insert tableName="role">
            <column name="uuid" value="${random_uuid_function}"/>
            <column name="name" value="ADMIN"/>
        </insert>
    </changeSet>

This is fine, but as already mentioned:

Neither Flyway nor Liquibase are able to interpolate a variable placeholder (like ${random_uuid_function}) with a function callback defined in Java.

Using a schema migration tool programatically

Fortunately, Flyway and Liquibase both support programatically defined changesets: You can write Java code which executes the SQL statement. In Liquibase you have to use the customChange tag. The following code snippet describes the required definition in YAML:

databaseChangeLog:
     - changeSet:
         id: create-default-roles
         author: schakko
         changes:
             - customChange:
                 class: de.schakko.sample.changeset.DefaultRoles20171107

The class de.schakko.sample.changeset.DefaultRoles20171107 must implement the interface CustomTaskChange:

public class DefaultRoles20171107 implements CustomTaskChange {

	@Override
	public String getConfirmationMessage() {
		return null;
	}

	@Override
	public void setUp() throws SetupException {
	}

	@Override
	public void setFileOpener(ResourceAccessor resourceAccessor) {
	}

	@Override
	public ValidationErrors validate(Database database) {
		return null;
	}

	@Override
	public void execute(Database database) throws CustomChangeException {
		JdbcTemplate jdbcTemplate = new JdbcTemplate(new SingleConnectionDataSource(((JdbcConnection)database.getConnection()).getUnderlyingConnection(), false));
		jdbcTemplate.update("INSERT INTO role (uuid, name) VALUES(?, ?,)", new Object[] { java.util.UUID.randomUUID(), "ADMIN" });
	}

}

Liquibase’s Spring Boot auto-configuration is executed in an early stage in which Hibernate is not loaded. Because of this we can’t inject any Spring Data JPA repositories by default. Even accessing the Spring context is not so easy. You need to provide the application context through a static attribute and so on.
With Flyway the Spring integration is much better.

Conclusion

This blog post demonstrated how initial data can be inserted into a Spring Boot application’s database. In addition to that we discussed how this data can be versionized in a database-independent manner.

Website moved to new Uberspace with HTTPS

After migrating my domain to Route 53 I finally transferred my website to a new Uberspace host which supports Let’s Encrypt. You should be automatically redirect to HTTPS when visiting www.schakko.de.
The whole procedure took 2 hours, including setting up the new Uberspace, importing the existing databases and changing the DNS records. Most of this was straight forward as the Uberspace team has a really good documentation for this.

BTW: Route 53 sets the TTL for each DNS record to 300 seconds by default. In most cases, 1 day should be sufficient. More DNS calls means more to pay.

Fixing periodically occurring WiFi lags when running Claymore’s Ethereum miner

This is a blog post which literally drove me crazy for a week. After building our mining rig I experienced a bad WiFi connection with high pings, periodically occuring every 30 seconds.
Just scroll down to see my – fairly simple – solution.

Getting into the mining business

A few weeks ago some of my co-workers and I decided to build a simple mining rig to make some Ethereum tokens. The current exchange rate for Ethereum fell down the last days but it is like it is. Anyhow, we bought 12 Nvidia GTX 1070, 12 riser cards, 2 mainboards, 4 PSUs with 600 W each and a wattmeter. We assembled everything into an open metal cabinet, put an access point (DD-WRT firmware, Linksys) on it and connected the mainboards with the access point.
I have to say that the mining rig itself is located in one of our flats in my study room. The access point on top of the cabinet acts as a wireless bridge to our other flat. Both mainboards and my workstation are connected to the access point are connected with Ethernet cables. The other flat contains an additional access point with a cable modem and internet connectivity. Nothing fancy.
We switched from ethminer to Claymore’s Ethereum Dual miner due to some problems handling multiple cards and wallets. In the end the rigs worked like a charme.

Experiencing lags in Overwatch

Two days later I wanted to play an Overwatch match on my workstation, also located in my study room. The ping was unstable and a simple ping command shows that I had random timeouts and the ping spiked every 30 seconds from 20ms to &gt 1500ms for a few seconds. This has not happened before the mining rigs were active.

“This must be a software problem of Claymore’s miners”

My first guess was that is has to be a software problem of Claymore’s miner. One of my co-miners tested a single mainboard with one GPU before at his home and everything worked flawlessly. I started to analyze the problem:

  • Killed each claymore miner process on rig1 and rig2: no lag occurred
  • Started a single claymore miner process: lag occurred every 30 seconds with > 600ms when receiving the first Ethereum share. This indicated a problem of the network implementation of Claymore’s miner or some high bandwidth usage. I checked the bandwidth but one claymore miner instance just requires 12 kBit/s.
  • Started tcpdump on rig1 to identify any conspiciuous network activity or packets. Neither UDP nor TCP traffic were eye-catching. I could only relate the receivement of Ethereum shares with latency spikes. The used network bandwidth was still low.

“This must be a network problem with Claymore’s miner”

The last application I had slightly similiar problems was Subversion. 10 years ago SVN sometimes failed to commit data. It turned out that Tortoise SVN struggled with special packets, the MTU size of our company network and the MTU size of our ADSL connection. Because of this, I changed the MTU size of the rig running the single claymore process. It did not influence anything.

Before I tried something else I disabled the network-related services firewalld and chronyd – without success. stracing the miner did also not show anything special.

“This must be a problem with Ethereum protocol and DD-WRT”

Some interesting observation I did was that the ping between rig -> ap2 (bridge) -> ap1 (router) &gt internet and workstation -> ap2 (bridge) -> ap1 (router) &gt internet were both bad but pinging directly from the main access point ap1 (router) -&gt internet showed no problem. What the hell?
I suspected some TCP settings on ap2 (bridge) led to this hickups. Luckily I could check the network settings and stats of both access points (bridge and router) as they are running on DD-WRT. As you can imagine: there were no suspicious network stat (TCP/UDP) changes when a spike occurred.

Could this be a hardware problem?

As I could not see any problem in the software or on the network layer (>= L2), there could only be a generic hardware problem or some L1 error.
During my TCP stats investigation on the access points, I noticed was that the WiFi rate of the bridge (ap2) were unstable and had heavy fluctuations. This were highly unusal as it has not happened before the building of the rigs.
To exclude any directly network related problems I did the simplest possible action: I pulled the Ethernet cables of both rigs (running one active miner process each) so they were no longer connected to the access point. To my suprise I had still network lags. WTF?
After killing both miner processes the network lags went away. This had to be obviously a problem with the GPU load the mining process creates.

To give you some insight: Due to some DD-WRT restrictions the bridge between both access points uses 2.4 GHz and not 5 GHz. Could this be that some interference on the wireless layer?
After googling for “gpu” and “spike” some links catched my eyes:

After reading both posts

  • I changed the WiFi channel from 1 to 11
  • I removed the DVI cable from a TFT connected to one rig
  • I removed the USB keyboard connected to one rig

Nothing changed. This was likely the point I wanted to give up. The last thing to test was using another power connection. The ap2 and all 4 PSUs of the rig were connected to the same connector (psu1,psu2,psu3,psu4)->wattmeter->wall socket. Maybe it could be some spikes in the voltage when the GPU has load, leading to a confused access point hardware?

Changing the wall socket

I had no free wall socket available behind the cabinet containing both rigs. So I put the access point from the top of the rig to the floor and moved it some centimeters in the direction of the other wall. After the access point had power and were connected to ap1 (router) again, the network spikes lowered from 1600 ms to 800 ms. Uhm? I again moved ap1 20 centimeters away from the cabinet. Spikes went down to 400ms.

The revelation

In a distance of 1.50 meter between rig and access point no more spikes occurred. I counterchecked if the the different wall socket was the solution. But switching from one wall socket to the wattmeter-connected connector made no difference.
So simple. By just moving the access point away. This whole thing drove me crazy for atleast 5 afternoons. I felt so stupid.

The high load of the GPU when running the Ethereum mining process produces either a signal at 2.4 GHz (which is more unlikely) or a harmony around 1.2 GHz (which is more likely). I assume that the spike every 30 seconds occur when both rigs receive the same mining job at almost the same time and start the mining. If anybody has more information, just let me know. I am heavily interested in the technical explaination for this.

Transferring DNS from Uberspace to AWS Route 53

Vacation time means administration time. I am one of these Uberspace customers whose domain has been registered and managed not by an external DNS registrar but by Uberspace (or Jonas Pasche) itself. Since a few years Uberspace has not been providing this service. Actually this was not a problem and everything worked fine. The reason why I had to deal with it was, that I originally wanted to enable Let’s Encrypt for www.schakko.de – for obvious reasons. My space is still hosted on a older Uberspace server running CentOS 5, not having Let’s Encrypt integration. To use LE I had to move to a newer Uberspace server and just point my DNS records to the new host’s IPv4/IPv6.
This was the point were I thought about asking the Ubernauten to just change the DNS registration and everything would have been good. But to be honest I did not want the Ubernauten to follow some no longer supported procedures. I am still a developer and do exactly know how upsetting this can be. So I thought about alternatives and decided to go with AWS Route 53. This is by all means not the cheapest solution but for my future private projects I am planning to use AWS so this did fit best.

Prepararing the current DNS entries

Route 53 requires that the contact information for the domain registrant (= domain owner or “Domaininhaber” in denic-Sprech) must contain a valid e-mail address. This address is later used for the verification of the domain ownership. As I could not edit the information I asked the Uberspace admins to change the e-mail address. Just some information you might find useful:

  1. Contact information can be hidden to protect the privacy of the owner. This includes the e-mail address of the domain registrant.
  2. denic.de’s whois service does not show that these information is hidden because of privacy protection.
  3. denic.de’s field last update (“Letzte Aktualisierung” in German) does not get updated when the e-mail address is updated.

The two last bullet points highly irritated me as I thought nothing had changed. Nevertheless I started the domain transfer after having waited for two days.

Setting up the hosted zone

In AWS’ Route 53 administration panel you need to go to Hosted zones and click Create Hosted Zone. The following record sets have to be created:

Name Type Value Description
empty A IPv4 address of your host see Uberspace datasheet
empty AAAA IPv6 address of your host see Uberspace datasheet
empty MX 0 xyz.uberspace.de xyz is your current Uberspace host. Do not forget the leading 0!
www A IPv4 address of your host see Uberspace datasheet
www AAAA IPv6 address see Uberspace datasheet
Required record sets

Required record sets

Transferring the domain from Uberspace to Route 53

  • Log in into your AWS account and select Route 53
  • Go to Registered Domain > Transfer Domain
  • After having entered the domain name and selected the TLD you have to provide the Authorization code. This has been entered by the Uberspace guys in my ~/authcode file.
    For the name server options you can either select Continue to use name servers provided by the current registrar or DNS service or Import name servers from a Route 53 hosted zone that has the same name as the domain. I mistakenly used the first option (see below), you should go with the second option. Route 53 replaces the current NS entries with the previously created zone.

    Authorization code and name servers

    Authorization code and name servers

  • After clicking on Continue you have provide your contact information and make sure the checkbox Hide contact information if the TLD registry, and the registrar, allow it is checked.

A few minutes after you have purchased the domain, you will receive two e-mails:

  • Final transfer status from domainnameverification.net
  • Transferring to Route 53 succeeded from AWS

All in all it took no longer than 10 minutes. Fun fact: I did not receive an e-mail with a verification link. Providing the authcode seems to be sufficient.

Changing the nameservers in Route 53

As I have already written, I mistakenly let the nameservers of my domain pointed to nsX.jonaspasche.com. To change the NS entries you just have to to got Registered domains > $YOUR_DOMAIN > Add or edit name servers.
Replace the entries with the NS entries from your hosted zone (xyz.awsdns-xyz.net etc).

Update current nameservers

Update current nameservers

Please note that updating the NS entries takes some time. The TTL for jonaspasche.com has been set to 3600 seconds so I had to wait around 1 hour that all my changes have been propagated.

Running multiple Claymore miner instances with different wallets and GPUs

A few days ago I switched from ethminer to Claymore’s Dual Ethereum Miner because ethminer has problems running multiple instances with multiple GPUs. My blog post How to run same ethminer instance with multiple GPUs is still valid but ethminer simply can’t handle two or more parallel running instances.

In addition to my previous blog post I want to show you how to mine with multiple GPUs into different Ethereum wallets. The following has been tested with Fedora 25.

Add a configuration file for each of your wallets

For each of your wallets create configuration a file named /etc/sysconfig/claymore-miner-$WALLETNAME with this content:

ETHER_ADDRESS=0x$YOURETHERADDRESS
SIA_ADDRESS=$YOURSIACOINADDRESS
EMAIL_ADDRESS=$EMAIL
## GPUs must *not* be separated by comma or whitespace!
## Use the first three GPU cards on your mainboard
GPUS=012

Create a systemd service template

Create the file /etc/systemd/system/claymore-miner@.service and paste the configuration into it:

[Unit]
Description=Ethereum miner for %i
After=network.target


[Service]
User=nobody
EnvironmentFile=/etc/sysconfig/claymore-miner-%I
ExecStart=/bin/bash --login -c "/opt/claymore-miner/ethdcrminer64 -epool eth-eu1.nanopool.org:9999 -ewal ${ETHER_ADDRESS}/$YOUR_RIG_NAME/${EMAIL_ADDRESS} -nofee 0 -mport 0 -espw x -ftime 10 -di ${GPUS} -mode 1"
# -mode 0 -dcoin sia -dpool sia-eu1.nanopool.org:7777 -dwal ${SIA_ADDRESS}/$YOUR_RIG_NAME/${EMAIL_ADDRESS}"
## If you want to dual-mine Siacoin, uncomment the line above and remove '-mode 1"' in the line before
Restart=always

[Install]
WantedBy=multi-user.target
  • Replace $YOUR_RIG_NAME with the name of your rig, whitespaces are not allowed
  • Uncomment the -mode 0 line to enable dual mining mode

Enable the service

We have delayed the start of each miner after booting by adding a simple crontab entry:

@reboot sleep 60s; systemctl start claymore-miner@$WALLETNAME1; systemctl start claymore-miner@$WALLETNAME2

If you like it, and want say thank you, you can drop me some wei at 0x4c1856c9021db812f0b73785081b245f622d58ec 🙂

How to pass multiple parameters to systemd’s ExecStart – Running same ethminer instance with multiple GPUs with systemd

For our Ethereum mining rig a coworker of mine wrote a systemd template unit so it is relatively easy to configure which graphic card in the rig is assigned to whom.

For each of the GPU owners exist a custom configuration file /etc/sysconfig/ethminer-$USERNAME (/etc/sysconfig/ethminer-ckl in my case). The file contains the following parameters:

ETHER_ADDRESS=0xYOUR_HASH
EMAIL_ADDRESS=<your email address>
# use first three GPUs, last three would be 3 4 5
OPENCL_DEVICES=0 1 2

Each user has its own configuration file and all services can be started like

systemctl start ethminer@ckl

But when ethminer was started by using systemctl start only the first GPU in the defintion was used – GPU 0 in the configuration sample above. systemd itself called the ethminer binary in a correct way and the same command line worked when executed by hand. The problem occurred by how systemd passes arguments and how ethminer reads the them. In the end I fixed it by wrapping the ethminer command in a sub-bash process. Our unit definition in /etc/sysconfig/ethminer@.service looked like this:

[Unit]
Description=Mine Ether for %i
After=network.target


[Service]
User=nobody
EnvironmentFile=/etc/sysconfig/ethminer-%I
# BEFORE AND NOT WORKING
# ExecStart=/usr/bin/ethminer --farm-recheck 2000 -G -S eth-eu1.nanopool.org:9999 -O ${ETHER_ADDRESS}/rig02/${EMAIL_ADDRESS} --opencl-devices ${OPENCL_DEVICES}

ExecStart=/bin/bash --login -c "/usr/bin/ethminer --farm-recheck 2000 -G -S eth-eu1.nanopool.org:9999 -O ${ETHER_ADDRESS}/rig02/${EMAIL_ADDRESS} --opencl-devices ${OPENCL_DEVICES}"

Restart=always

[Install]
WantedBy=multi-user.target

Attempting to perform the InitializeDefaultDrives operation on the ‘FileSystem’ provider failed.

Yesterday one of our check_mk monitoring scripts based upon PowerShell failed, because the error Attempting to perform the InitializeDefaultDrives operation on the ‘FileSystem’ provider failed. showed up. The initial reason for this is unknown but has to do with the New-PSDrive and Remove-PSDrive PowerShell commands we use inside the check_mk scripts.

Symptoms for the problem are:

  • The network drive is shown as disconnected in the Windows Explorer but you can still open the network drive
  • The network drive can not be disconnected as you receive the error “Drive is not connected” or “Access Denied”
  • Get-PSDrive returns the network share
  • net use does not return the network share
  • The network drive can’t be removed with net use $YOUR_DRIVE /delete /y
  • The network drive can’t be removed with Remove-PSDrive
  • Restarting the Windows Explorer still shows the drive

To remove the network drive I tried to remove all registry keys which belonged to it, killing the check_mk agent, restarting the virtual machine and so on. In the end I fixed it by disabling the check_mk agent, restarting the VM and re-enabling the check_mk agent.

Update 2017-06-14: The error occurs because the check_mk agent runs under the Local System account. As a normal user or administrator you have no permission to remove the drive. A good indicator for this is the existence of the mapped network drive in the registry key HKEY_USERS\S-1-5-18\Network\$YOUR_DRIVE (S-1-5-18 is the Well Known SID for Local System account). You can delete the registry key but the drive still exists until you reboot the Windows instance. In some circumstances even then the drive still shows up.

Microsoft suggests to set the registry key HKLM\System\CurrentControlSet\Control\SessionManager\ProtectionMode from 1 to 0. This values allows an administrator to access system resources. Just don’t do it for the sake of security.

Luckily for us we can use Mark Russinovich’s psexec tool to gain to Local System permissions. With this we can safely remove the drive without a reboot.

psexec64 -i -s powershell.exe
# gain Local System permissions
Remove-PSDrive -Name $YOUR_DRIVE
# drive automatically disappears from Explorer and a Get-PSDrive execution as non-Local System

Cross-domain migration from Windows Server 2008 R2 to Windows Server 2012 R2

In the first weeks of our company I made the mistake to set up the Active Directory domain with a .local suffix which caused a lot of problems with Windows and Linux-based clients in the longer run. Besides that after I have shifted back my work to my original tasks – I am still a developer – the network infrastructure went into some kind of unplanned chaos. Too many people tried too many things for which they had no experience or no sense for the implications they made. Besides that we got a parallel FreeIPA domain nix.internal which had a domain trust with the Windows domain. Due to the complexity no one really tried to push the domain migration, even if it has been a long outstanding issue on our Kanban board.

Because of internal changes a few weeks ago I took over the ownership of the infrastructure. To put the “structure” in “infrastructure” together with one of my co-workers I immediately started to plan the final migration phase.

Setting the goals

The new design of our infrastructure has been done over the last months.

  1. Instead of having the two domains domain.nix.internal and domain.local the new domains should be ad.domain.de and ipa.domain.de. ad.domain.de is managed by Active Directory, ipa.domain.de by FreeIPA. Both domains have a bi-directional trust.
  2. The servers containing the domain controller and the Microsoft Exchange server must be both migrated from Windows Server 2008 R2 to Windows Server 2012 R2.
  3. The Exchange server must be migrated from Microsoft Exchange 2010 to Exchange 2016.

Besides that we had a lot of other goals like a global naming schema for hosts, CNAME usage, consolidating our VMs and other things but this is out of scope for this article.

I prepared an own JIRA project containing all the tasks we had to solve.

Setting up the new domain

Setting up the new Active Directory and FreeIPA server was straight forward. A domain trust between domain.local and ad.domain.de just as between ad.domain.de and ipa.domain.de were established. I had to manually change the file permissions on our Linux-based file store as our permission concept has been also changed. Instead of assigning user permissions to files or directories we wanted to use security groups. Best practice. This took some time but was worth the effort as I fixed a lot of permission problems with the manual review.

After setting up the domain itself I prepared the (inactive) DHCP server, imported the existing settings via PowerShell from the domain.local controller, set up the DNS forwarding and so on.

Migration of Microsoft Exchange 2010 to 2016

It is one thing to make a migration from Exchange 2010 to a newer version but a complete different story to make a cross-domain migration of Exchange. Google’s results for such a constellation are relatively comprehensible. Microsoft does not support such a migration since Exchange 2007, using PowerShell and own scripts does also not work. We ended up in buying CodeTwo Exchange Migration which saved us a lot of time and pain. If you ever need to do a cross-domain migration of Exchange purchase a license. It is worth every cent.

After the initial installation of the new Microsoft Exchange 2016 server and joining it to the new ad.domain.de domain we set up the new Exchange server as a mail relay in our old Exchange Server 2010, both servers were listening to the same e-mail domain. This approach allowed us to test the new mail server with the existing domain. All other server settings were either exported and imported by using PowerShell or configured by hand/PowerShell.

On the migration day we disabled the POP3 collector on our old Exchange 2010, reconfigured the proxy server to point to the new Exchange 2016, did a last CodeTwo Exchange Migration run and enabled the POP3 collector on the new Exchange 2016. Apart from some hickups with the internal/external MAPI URL of the Exchange this went suprisingly smoothly.

Migrating clients into the new domain

The migraiton of the existing clients (PCs, notebooks) took much longer than expected. The migration of the local profiles did not work on all workstations. During the next logon the users were presented with the error “There are Currently No Logon Servers Available” (“Es stehen momentan keine Anmeldeserver zur Verfügung”). I figured out that this was a problem with the DNS configuration of the clients. During the join process, the new DHCP server were still disabled and the clients used the old domain controller for domain.local as DNS resolver. The server had a forwarded domain to ad.domain.de so the clients were able to join the new domain. But during the logon process, the client asked for a SRV record on the DNS controller and got the old domain returned, resulting in the error above. After disabling the old DHCP server, enabling the new DHCP server, manually setting the DNS server to the new domain controller and re-joining the new domain the logon issue was gone.

Status Quo

There is still a lot do, e.g. our Atlassian instances are still connected to the old domain. As the security groups are completely different in both domains I’ll have to fix this by hand. Maybe I’ll write a blog post on that, too.

Lessons learned

  1. If you don’t really need to do a cross-domain migration, don’t do it.
  2. Planning, planning, planning. 2/3 of the time I invested for this project I spent with planning.
  3. Don’t underestimate the effort. Even if you have a good plan and everything is prepared, there is so much what can go wrong. The week between Christmas and New Year was the only time we could do this without having a large impact to our business.
  4. Don’t trust any profile migration tools. Reset the user’s password and log in with their credentials.
  5. If you don’t really need to do a cross-domain migration, don’t do it.