Tuesday

Monday: not so successful.

Today:

I thought it would be cool to set up a cluster using CNS (Container Native Storage) to back everything that needs storage. Well, it’s a learning experience at least.

The first thing that happened is that the install breaks trying to install iptables-services because iptables needs to be updated to match it at the same time. Not sure if this is going to be a common problem for others but I updated my tooling to fix this up.

Then I didn’t free up the devices that are needed on each node for GlusterFS to run. The CNS deploy failed. Once I fixed that up and ran it again I got a pretty mysterious error:

    "invocation": {
        "module_args": {
            "_raw_params": "oc rsh --namespace=glusterfs deploy-heketi-storage-1-fhh6f heketi-cli -s http://localhost:8080 --user admin --secret 'srtQRfJz4mh8PugHQjy3rgspHEfpumYC2dnBmQIoX9Y=' cluster list", 
   ...
    "stderr": "Error: signature is invalid\ncommand terminated with exit code 255", 

Turns out the problem was that the heketi pod was created with the secret in its env variables on the previous run, and then the secret was re-created for the second run. This playbook doesn’t handle consecutive runs too well yet. I added the openshift_storage_glusterfs_wipe=True option to inventory so it would start fresh and tried again. This time it failed differently:

 "invocation": {
 "module_args": {
 "_raw_params": "oc rsh --namespace=glusterfs deploy-heketi-storage-1-nww6c heketi-cli -s http://localhost:8080 --user admin --secret 'IzajucIIGPp0Tm3FyueSvxNs51YYjyTLGvWAqsvfolY=' topology load --jso
n=/tmp/openshift-glusterfs-ansible-dZSjA4/topology.json 2>&1", 
...
 "stdout": "Creating cluster ... ID: 00876e6ce506058e048c8d68500d194c\n\tAllowing file volumes on cluster.\n\tAllowing block volumes on cluster.\n\tCreating node ip-172-18-9-218.ec2.internal ... Unable to cre
ate node: New Node doesn't have glusterd running\n\tCreating node ip-172-18-8-20.ec2.internal ... Unable to create node: New Node doesn't have glusterd running\n\tCreating node ip-172-18-3-119.ec2.internal ... U
nable to create node: New Node doesn't have glusterd running",

But if I rsh into the pods directly and check if glusterd is running, it actually is. So I’m not sure what’s going on yet.

jarrpa set me straight on a number of things while trying to help me out here. For one thing I was thinking glusterfs could magically be used for backing everything once deployed. Not so; you have to define it as a storage class (there’s an option for that I hope works once I get the rest worked out: openshift_storage_glusterfs_storageclass=True). And then you have to make that storage class the default (apparently a manual step at this time), and have the other things you want to use it (logging, metrics, etc.) use dynamic provisioning for their storage. Something to look forward to.

I worked on a little PR to bring some sanity to the registry-console image.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: