Results for category "DevOp"

3 Articles

Using Atlassian OpsGenie with a localized on-premises Jira instance

We are currently in the process of migrating our alerting infrastructure from OMD to Atlassian’s OpsGenie. Most of the features (SMS, phone call etc.) worked out of the box but we struggled with pushing alerts back into our on-premises Jira instance.

Enable logging of POST requests

OpsGenie does not provide debug logs of all executed HTTP requests against Jira’s REST API. Instead, only the very generic HTTP status code is provided like

Lucky for us, our Jira instance is running behind an Apache HTTPD webserver acting as a proxy. With help of the mod_security module we were able to trace the communication between OpsGenie and our Jira instance:

LoadModule security2_module modules/mod_security2.so

<VirtualHost *:443>
# ...
    <IfModule mod_security2.c>
       SecRuleEngine On
       SecAuditEngine On
       SecAuditLog /var/log/httpd/modsec_audit.log
       SecRequestBodyAccess on
       SecAuditLogParts ABIJDFHZ
    </IfModule>
</VirtualHost>

Configuring the Jira worfklow schema

OpsGenie requires you to have atleast a workflow with the following status transitions:

  • TODO/Open -> In Progress
  • In Progress -> Resolved

It is important, that

  1. the statusses are named exactly as “Resolved” and “In Progress” as OpsGenie’s internal Jira connector is case-sensitive
  2. you are not confusing the status name with the status category

In English-based Jira installations, this should not be an issue but in our localized German environment, we had to add both statusses to Vorgänge > Status and add its English translation to the status:

To check the correct names, you can access the REST API of your Jira instance like https://jira/rest/api/2/issue/${OPSGENIE_PROJECT}/${ISSUE_ID}/transitions. The transitions[].to.name field inside the JSON response must match the statusses above, like

{
	"expand":"transitions",
	"transitions":[
		{"id":"41","name":"Öffnen",
			"to":{
				"self":"https://jira/rest/api/2/status/10617",
				"description":"Der Vorgang wird aktuell nicht bearbeitet und wurde noch nicht vollständig fertig gestellt.",
				"iconUrl":"https://jira/images/icons/statuses/open.png",
				"name":"Offen","id":"10617",
				"statusCategory":{"self":"https://jira/rest/api/2/statuscategory/2","id":2,"key":"new","colorName":"blue-gray","name":"Aufgaben"}
			}
		},
		{"id":"61","name":"Resolve",
			"to":{
				"self":"https://jira/rest/api/2/status/5",
				"description":"Resolved",
				"iconUrl":"https://jira/images/icons/statuses/resolved.png",
				"name":"Resolved",
				"id":"5",
				"statusCategory":{"self":"https://jira/rest/api/2/statuscategory/3","id":3,"key":"done","colorName":"green","name":"Fertig"}
			}
		}
	]
}

After we had configured the workflow schema, OpsGenie was able to create issues and transitions them to the In Progress status.

Configuring the screen mask for solving open alerts

When we tried to close an open alert in OpsGenie, Jira failed with the HTTP 400 error described above. In our mod_security logs we saw the following output:

POST /rest/api/2/issue/${ISSUE_KEY}/transitions HTTP/1.1
Accept: text/plain, application/json, application/*+json, */*
Content-Type: application/json;charset=UTF-8
....
Accept-Encoding: gzip,deflate

--8ddfb330-C--
{"transition":{"id":"61"},"fields":{"resolution":{"name":"Done"}}}
--8ddfb330-F--
HTTP/1.1 400 Bad Request

The transition.id 61 pointed to the transition from In Progress to Resolved but its screen mask was obviously missing the “resolution” field. You can easily check the fields for a transition by accessing the issues’s transition configuration: https://jira/rest/api/2/issue/${ISSUE_KEY}/transitions?61&expand=transitions.fields.
We added the missing Lösung (Solution) field to the screen mask of the transition but the error still occurred.

Translating the “Solution” field

Again, the solution fields have to be translated so that is called “Done” and not “Fertig”. You can change the translations at https://jira/secure/admin/ViewTranslations!default.jspa?issueConstantType=resolution.

In the end, everything is working and OpsGenie is now able to create issues and move them through the expected statusses/transitions.

Receiving “com.amazonaws.services.s3.model.AmazonS3Exception: Not Found” when using Jenkins’ pipeline-aws-plugin and s3Upload step with Minio

I am currently working on a Jenkins declarative pipeline to connect the Jenkins builds with Kubernetes, Helm and Netflix Spinnaker. One of TODOs has been to deploy different artifacts (e.g. a helm chart my-chart-0.0.1.tar.gz) to an AWS S3-compatible bucket inside a Minio installation with help of pipeline-aws-plugin.

When running

withAWS(endpointUrl: 'https://minio.domain.tld', credentials: config.credentialsId) {
	s3Upload(file: "my-file.txt", bucket: "my-bucket")				
}

my pipeline always threw an exception with

com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: null; S3 Extended Request ID: null), S3 Extended Request ID: null
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1695)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1350)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1101)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:758)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:732)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:714)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:674)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:656)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:520)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4705)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4652)

Trying other clients with Minio

At first I suspected some misconfiguration of my Minio installation. I checked the S3 upload with mc and AWS’ own CLI. Both worked flawlessly so it had to be something else.

Enable logging

To get some more debugging output, I configured Jenkins’ to log events for com.amazonaws and org.apache.http.wire. The debugging output does not show up inside the build job’s console output but under the configured logger.

Host-style access to S3 buckets

After scanning the debug output, I noticed the following:

http-outgoing-11 >> "PUT /my-file.txt HTTP/1.1[\r][\n]"
Jan 21, 2019 9:36:15 PM FINE org.apache.http.impl.conn.Wire wire
http-outgoing-11 >> "Host: my-bucket.minio.domain.tld[\r][\n]"
Jan 21, 2019 9:36:15 PM FINE org.apache.http.impl.conn.Wire wire
http-outgoing-11 >> "x-amz-content-sha256: UNSIGNED-PAYLOAD[\r][\n]"
...
http-outgoing-11 << "[\r][\n]"
Jan 21, 2019 9:36:15 PM FINE org.apache.http.impl.conn.Wire wire
http-outgoing-11 << "default backend - 404"
Jan 21, 2019 9:36:15 PM FINE com.amazonaws.services.s3.internal.S3ErrorResponseHandler createException
Failed in parsing the error response : default backend - 404

When pipeline-aws-plugin initiates a request to my bucket, it does not https://minio.domain.tld/my-bucket but https://my-bucket.minio.domain.tld. This is totally fine for AWS S3 buckets. But with the Minio deployment in our Kubernetes cluster, this does not work out of the box:

  1. By default, our Minio deployment does not use the –address parameter described in https://github.com/minio/minio/issues/4681
  2. Our Minio ingress does also not listen to 4th level domains like my-bucket.minio.domain.tld so that the nginx proxy returns the “default backend – 404” string seen in the log output above.

Solving the issue

Instead of configuring host-style access I fixed it by simply using the pathStyleAccessEnabled: true in my s3Upload step. When enabled, pipeline-aws-plugin does not use the bucket name as 4th level subdomain but appends the bucket name to the host name:

withAWS(endpointUrl: 'https://minio.domain.tld', credentials: config.credentialsId) {
	s3Upload(pathStyleAccessEnabled: true, file: "my-file.txt", bucket: "my-bucket")				
}

Migrating InfluxDB from 0.9.6 to 0.10.0 GA

You may have already heard that InfluxDB 0.10 GA has been published a few days ago. In my case the most interesting improvement are the much higher compression rates: At the moment my co-workers of NeosIT and I are collecting performance data from four internal virtual machines. Have been running the collectd daemon on the VMs for about a month, the size of InfluxDB’s data directory increased by 3.5 GByte with default retention policy.

Testing the update

After setting up a new virtual machine with InfluxDB 0.9.6 for migration testing, I updated InfluxDB to the current version (Fedora 23 in our case):

sudo wget https://s3.amazonaws.com/influxdb/influxdb-0.10.0-1.x86_64.rpm
sudo dnf install influxdb-0.10.0-1.x86_64.rpm

Two notes:

  1. As mentioned during the update procedure, the location of the configuration file moved from /etc/opt/influxdb/influxdb.conf to /etc/influxdb/influxdb.conf. The old configuration file is neither copied nor merged. You must edit the new configuration file or copy the old configuration to the new location. Keep in mind that 0.10.0 introduces some new settings.
  2. In addition to the changed configuration location the data directory has been moved from /var/opt/influxdb/ to /var/lib/influxdb. My advice is to move the old folder to the new location and overwriting any new files. Please create a backup from the /var/lib/influxdb folder before doing this. In my case this procedure worked without any problems.

Unfortunately it doesn’t seem to be possible to just copy a single database from one separate InfluxDB instance to another. Because of this drawback I imported some data from our virtual machines through Logstash and InfluxDB’s collectd backend. The Grafana dashboards still worked as expected so there seems to be no breaking changes in the HTTP API, at least as far as I could see.

Converting data from bz1 to tsm1

Now it was time to convert the data from the old bz1 format to the new tsm1. InfluxDB 0.10 delivers a nifty tool named influx_tsm. The usage is very easy and it worked as expected. Just pass the parameter “-backup” or “-nobackup”, the data location to be converted and you are done:

systemctl stop influxdb
influx_tsm -backup ~/inflxudb_backup -parallel /var/lib/influxdb/data/
systemctl stat influxdb

Results of the conversion

In my test environment everything worked as expected so I migrated our production environment. The influx_tsm took around ~50 minutes for 3.6 GByte of data but the results are awesome. Our data directory shrinked from 3.6 GByte to 400 MByte. All Grafana dashboards work as expected as well as our collectd and Graphite InfluxDB backends. This is the output of influx_tsm:

# ...
2016/02/07 13:34:29.100186 Still Working: Completed Shards: 29/30 Points read/written: 81626859/81626859
2016/02/07 13:34:31.969792 Conversion of /var/lib/influxdb/data/collectd/default/60 successful (49m41.016488953s)

Summary statistics
========================================
Databases converted: 3
Shards converted: 30
TSM files created: 30
Points read: 81655978
Points written: 81655978
NaN filtered: 0
Inf filtered: 0
Points without fields filtered: 0
Disk usage pre-conversion (bytes): 5239898112
Disk usage post-conversion (bytes): 451150534
Reduction factor: 91%
Bytes per TSM point: 5.53
Total conversion time: 49m48.08131854s

Thanks guys, good job!