The 7 easy steps TIME VAMPIRES hope you won’t use. #6 changed my life!

Vampire Kitty

Picture by Faris Algosaibi, under CC by 2.0 license (https://creativecommons.org/licenses/by/2.0/). Cropped from original.

You sit down after dinner to research a few home improvement ideas on the web. A funny picture with a link catches your eye and… when bed time rolls around you’ve read fourteen motivational stories, watched a series of skateboarding turkey videos, grumbled on Facebook about obviously corrupt politicians in another state, shared some really pretty pictures of Antarctica you found, and not done any research! How did this happen? How does this keep happening?

How well you adapt for the rest of the 21st century will depend mainly on how well you defend against the distractions and outright manipulation in the increasing stream of available information. With the rise of social media, the voices screaming “look at me!” have shifted tactics – not only are they trying to get your attention, they’re trying to hijack your friends list. It didn’t take long to discover how to push our buttons; the psychological techniques are becoming well-known and ubiquitous. Every social media whore dreams of “going viral” and will post any outrageous thing if it gets a few million hits to their crappy ad farm.

“Go viral” is a good term for it; these are attention viruses spread via social media, using your resources as a host body to spread themselves. In addition to wasting your time, they tend to be inflammatory, misleading, and low in information. They make the internet a worse place. And I, for one, have gotten sick of seeing the same kinds of headlines trying to suck me in, the same kind of misdirection and manipulation stirring the internet’s collective psyche.

The good news is that you can fight back. The enemy may have your number, but you can learn to recognize the manipulation and avoid it. Here are some current ways – but rest assured that their bag of tricks will continue to adapt (you might want to run a search every so often for the latest).

1. Avoid posts with numbers in the title

Perhaps it’s the sense of accomplishment we feel as we tick through the list. Perhaps it’s the curiosity to see “are there *really* (only) that many?” Perhaps it’s just to see if we’re already smart enough to know the whole list. Maybe numbers are inherently more credible. Whatever the reason, social media experts have figured out that we will pretty much click on (and share) any damn thing presented as a numbered list. Even if you know it will probably tell you nothing new. (Numbered lists aren’t the only offenders; consider, for example, the trend of “99% will fail this simple test!” articles.)

Actual news is not presented as “25 things you didn’t know about XYZ”. Useful advice is not presented as “4 easy steps”. In fact, it’s incredibly rare for it to have any number at all in the title. So if it does, that’s a red flag: you can feel simultaneously smug and smarter about skipping this link because it is undoubtedly an attention virus.

Especially with an addendum like “Number X blew my mind!”

2. Avoid titles with emotionally charged words

“Shocking!” “Amazing!” “You won’t believe…” “Blown away” “Can’t stop laughing/crying” “Epic” ALL CAPS!

These and other attention-seeking techniques are used exclusively by attention-desperate social media whores. Not by actually informative articles. Not by people who have a purpose for writing other than maximizing clicks and shares. It is manipulation, pure and simple. The more prevalent this sort of thing becomes, the more it drowns out balanced, informative writing on the internet. And the more you read it, the more often your blood pressure and cortisol levels will rise needlessly. Don’t click that link! Don’t do it!

3. Avoid posts that you react to with “no way!” or “oh yeah?”

If you can feel your eyebrows rising just from reading the headline (police brutality, stupid politician tricks, “you’ll never guess”), you can bet it’s deliberately misleading in order to shock you and draw you in. Resist. And for the love of Bog, don’t get drawn into threads 100 comments long by people who didn’t even read the article. You will accomplish nothing but making yourself and perhaps a few others angry. Don’t bother unless you’re a troll and that’s how you get your lulz.

4. If you find yourself reading an attention virus, at least avoid sharing it

So you might enjoy following Facebook links for some meaningless entertainment from time to time. I get it. But… do you have to infect your friends too? Do you have to reward these time vampires?

No. You don’t. In fact, with the Information Deluge unfolding this century, it’s your responsibility not to.

If (perhaps by accident) you find yourself visiting one of these, at least keep it to yourself. You cover your mouth when you cough and sneeze, right? Have the same courtesy for your readers’ brains. Or are you one of those people still forwarding stupid chain letters?

5. Use ad-blocking browser plugins

The push for sensationalizing the internet is all about displaying ads. More clicks mean more ad views and more revenue. If you kill the ad revenue, you stop rewarding the behavior.

Also, you don’t need to waste your attention on ads. I have basically never seen an ad on Facebook. I don’t really see any ads on the web other than the occasional text ad or YouTube intro. How? I’ve been using ad-blocking software since it was an esoteric art requiring running a local proxy and manually tweaking browser settings and blacklist entries.

It’s a lot easier now. For years it’s been as easy as searching for browser plugins and clicking a few buttons to install them. I know about AdBlock+ for FireFox and AdBlock+ for Chrome. If you’re using something else there’s probably a plugin for that too (even for mobile).

Personally, I think advertising funding is a blight on the internet (“you won’t believe” I once worked for an advertising startup) and would like to see it destroyed in favor of other models. If you disagree and feel morally obliged to allow websites to manipulate your mind in return for the benefit they bring you (or even find targeted ads actually – choke! – informative), you can usually configure an ad-blocker to allow ads selectively, and still starve the time vampires you stumble upon.

6. Use site-blocking browser plugins

Self-restraint is a great thing to cultivate, but most of us would admit to needing a little help. And the wonderful thing about the information age is that there are tools to help you… automatically.

You can use site-blocker plugins to block whole sites that you know are full of trash, or just to restrict the amount of time that you waste at them. For example, BlockSite for Chrome and BlockSite for Firefox can block sites at specified times and days of the week. Also consider StayFocusd for Chrome, which tracks and limits the time you waste as well as providing a “nuclear option” to block all “bad” sites or allow only “good” sites for a few hours to help you concentrate. LeechBlock for Firefox appears similar. These double as procrastination-blockers, useful beyond simple avoidance of attention viruses.

Consider blocking all of the most addictive sites on the internet or pretty much anything linked from Buzzfeed (they recommend blocking themselves!). Or just look through your browser history to see where the time goes.

7. Filter your news sources

The easiest way to save money is to have it automatically deducted from your paycheck. You don’t miss the money you never see. Similarly, the easiest way to reserve your attention for worthy topics is to block ones you know tend to be trash. You don’t have to decide to ignore the tripe you never see.

Spam, trolls, and general “noise” have all been with us since the dawn of the internet. News-readers on usenet used killfiles to automatically ignore posts. Once email became a cesspool of spam and phishing, filtering became a standard tool there too (some services automating it with great sophistication). Social networking may take a little longer because frankly, Facebook’s profits are built on sucking you in and using your friends list and interests to advertise to you. It’s unlikely they’ll be providing useful automated filters of any variety soon.

Sick of the clutter on Facebook? Try F.B. Purity. It’s a browser plugin that essentially rewrites the Facebook interface, allowing you to filter out what you don’t want (hopefully this post has given you some good ideas). It’s pretty easy to install; just be aware that Facebook’s interface is changing all the time, and when it does you may experience bizarre glitches due to a mismatch between what Facebook provides and what F.B. Purity expects, at which point you’ll need to (1) recognize what’s going on (2) possibly wait for an update from FBP or just disable it for a while, and (3) update FBP. So this isn’t for everyone, but it’s what’s available right now. Perhaps other social networks that aren’t as invested in cramming junk in your face will lead the way in enabling filtering, forcing Facebook to do the same. Or perhaps Facebook will become irrelevant. I don’t know what will happen, but if users don’t start valuing the ability, it won’t appear of its own accord. I suggest taking matters into your own hands.

Other sources often do provide methods of filtering, or there may be a browser plugin to enable it. Search for these and use them.

Irony

Am I aware of the irony/hypocrisy inherent in this post at multiple levels? Yes; yes, I am.

But now you know. If you make this post the last time you’re ever roped in by these tactics, I can die happy.

Now share this with all your friends, and leave some comments below!

Response to “PaaS for Realists” re OpenShift

I work on OpenShift but would not claim to be the voice of OpenShift or Red Hat. I came across Paas for Realists the other day and thought it could use a quick response. No one is vetting it but me :)

There are some good points in this article. Just like other technology, a PaaS does not magically solve anything. Actually it carries forward all of the flaws of the technology it includes, and then adds a layer of complexity on top. The PaaS does a lot of work for you, and that can be really nice, but it can get in the way too, and the operational model is not always a good fit. It’s important to know what you’re getting and what you’re giving up by employing such a solution.

I wish I could just annotate the article, but quote and response will have to do…

Magical autoscaling

As I said in my previous post, this really doesn’t exist. Your application has to be DESIGNED to scale this way.

Agreed; auto-scaling is a difficult problem to tackle, and if your application isn’t designed for stateless, horizontal scaling up and down, it’s just not going to work well. This isn’t really specific to PaaS.

I would note, by the way, that while OpenShift does enable auto-scaling web frameworks according to a default algorithm, you can either override the algorithm or just disable it and scale manually, whatever works best for your app. One size does not fit all. Sticky sessions are built in if you need that, though sessions are only clustered with the JBoss EAP cartridge (so for all others, in-memory or on-disk sessions will be lost if you scale down or lose a gear).

Magical Autorecover

Just like autoscaling, this is also not what you think it is. Unless your application maintains exactly ZERO state, then you will never see this benefit.

OpenShift doesn’t really claim magical autorecovery or self-healing (yet). Agreed, if you’re storing state on a gear, it can be lost to an outage. Scaling just means you have copies of your webapp. If you want to maintain state in a production setting, you’ll need to store it off-PaaS. You would need to set up something to account for outages in a more traditional deployment too; OpenShift just doesn’t do anything special to help (yet).

I’m sure someone will call me on this and I’m willing to listen but I do know for a fact that the autofailover model of things like your MySQL instance depend on migratable or shared storage (at least from my reading of the docs).

Databases can generally be made HA via replication of some variety. We’ve done some R&D on making database cartridges scalable in order to provide HA storage to an app; it will probably happen at some point – definitely a weakness right now. For now, if you want a HA database, set it up outside the PaaS. You would have had to do that without a PaaS anyway, and your DBAs are already pretty good at it, right? What OpenShift *does* get you is the ability to develop your app on the PaaS against a throwaway DB, then when you want to go to production on the PaaS, you push exactly the same code to your production app and just set some environment variables to point to the production DB.

Same thing for storage; if you want HA storage, set it up outside the PaaS. This is even trickier to solve in-PaaS than DBs, but we’re hoping to address it based on the version of NFS that just came out with RHEL 7.

Also one of the more hilarious bits I’ve found is the situation with DNS. I can’t count the number of shops where DNS changes where things like wildcard DNS were verboten. Good luck with the PaaS dyndns model!

OpenShift doesn’t need wildcard DNS, but it does use DDNS, and that can definitely be a sticking point to demo in organizations where even delegating a subdomain is a bureaucratic battle. But at least it’s a battle you only have to have once at solution deployment, instead of for every single app you deploy. Do you have a better suggestion for how to dynamically make apps available? Most places don’t like to provide a pool of IPs to burn per-app; even more than they don’t like DDNS.

Operational Immaturity

Any tool that uses MongoDB as its persistent datastore is a tool that is not worth even getting started with. You can call me out on this. You can tell me I have an irrational dislike of MongoDB.

You have an irrational dislike of MongoDB :)

Well alright, not totally irrational. The default write concern for earlier versions of the mongo clients left something to be desired if you cared about DB integrity, we’ve encountered memory leaks when using SSL connections, and we haven’t made the leap to MongoDB 2.6 yet. I’m sure you have more horror stories to share.

But the fact is, we’ve been running using MongoDB as the core of OpenShift for years now – both in the public service and for our private solution – and it has seriously been very solid. Our ops guys said (and I quote) “We had a couple of outages early on that were traced back to mongo bugs but generally we don’t even think about it.  Mongo just keeps ticking and that’s fantastic.”

Was the use of MongoDB your only criticism here? I think we do provide pretty thorough instructions on how to make the OpenShift infrastructure solid.

Additionally I’ve found next to zero documentation on how a seasoned professional (say a MySQL expert) is expected to tune the provisioned MySQL services. The best I can gather is that you are largely stuck with what the PaaS software ships in its service catalog. In the case of OpenShift you’re generally stuck with whatever ships with RHEL.

Tuning is a definite weakness. It is hard to both provide software setup you don’t have to think much about, and also allow you to administer it. This is a known concern that I think we will work toward with the docker-based next generation.

You’re not stuck with whatever ships with RHEL on OpenShift. Publicly and privately we’ve added support for several SCLs which can provide completely new platforms or just different versions than what ship with RHEL. You can also add a custom cartridge to run pretty much any technology that speaks HTTP (and depending on your needs, many that don’t).

Another sign of operational immaturity I noticed in OpenShift is that for pushing a new catalog item you actually have to RESTART a service before it’s available.

Do you mean, to add a cartridge, you need to restart something? If so, that’s not really true, although it was in the past, and I’m not sure we’ve clarified the directions well enough since.

Disaster Recovery

After going over all the documentation for both tools and even throwing out some questions on twitter, disaster recovery in both tools basically boils down to another round of “good luck; have fun”.

[…]

Again based on the research I’ve done (which isn’t 1000% exhaustive to be fair), I found zero documentation about how the administrator of the PaaS would back up all the data locked away in that PaaS from a unified central place.

The OpenShift infrastructure just depends on redundancy for HA/DR. Hopefully that’s pretty straightforward given the directions.

To make your applications recoverable, use highly-available storage for the nodes and back it up. There’s not a great deal of detail in that section of docs, but does there need to be?

Affinity

Affinity issues make the DR scenario even MORE scary. I have no way of saying “don’t run the MySQL database on the same node as my application”.

Well, that’s true, but with OpenShift you *do* have the ability to define availability zones and the gear placement algorithm will ensure that your scaled app is spread across them. Once we get scaled DB cartridges I expect the same will apply for them (see above re “in-PaaS DBs aren’t for production yet”). And if that’s not good enough for you, we have a hook to customize the gear placement algorithm until it is good enough for you.

Unless your engineering organization is willing to step up to the shared responsibility inherent in a PaaS, then you definitely aren’t ready. Until then, your time and money is better spent optimizing and standardzing your development workflow and operational tooling to build your own psuedo-PaaS.

Agreed, your developers are not going to just be able to ignore that they’re working and deploying on a PaaS. It’s not a magical solution. It’s a specific way to provide services with pros and cons all its own and that context needs to be understood by all stakeholders.

It *may* be possible to create your own PaaS specific to your needs and be happier with it than you would be purchasing someone else’s solution. But I will say that we have run into a lot of forward-thinking companies that did exactly this within the last few years, and now are desperate to get away from maintaining that solution. Keeping up with ever-churning platforms and security updates always takes more work than anyone expects. So if you think PaaS is right for you, also ask yourself: do you want to be in the building-a-PaaS business, or the using-a-PaaS business?

OpenShift logging and metrics

Server logs aren’t usually a very exciting topic. But if you’re a sysadmin of an OpenShift Enterprise deployment with hundreds of app servers coming and going unpredictably, managing logs can get… interesting. Tools for managing logs are essential for keeping audit trails, collecting metrics, and debugging.

What’s new

Prior to OpenShift Enterprise 2.1, gear logs simply wrote to log files. Simple and effective. But this is not ideal for a number of reasons:

  1. Log files take up your gear storage capacity. It is not hard at all to fill up your gear with logs and DoS yourself.
  2. Log files go away when your gear does. Particularly for scaled applications, this is an unacceptable loss of auditability.
  3. Log file locations and rotation policies are at the whim of the particular cartridge, thus inconsistent.
  4. It’s a pain for administrators to gather app server logs for analysis, especially when they’re spread across several gears on several nodes.

OSE 2.1  introduced a method to redirect component and gear logs to syslogd, which is a standard Linux service for managing logs. In the simplest  configuration, you could have syslog just combine all the logs it receives into a single log file (and define rotation policy on that). But you can do much more. You can filter and send log entries to different destinations based on where they came from; you can send them to an external logging server, perhaps to be analyzed by tools like Splunk. Just by directing logs to syslog we get all this capability for free (we’re all about reusing existing tools in OpenShift).

Where did that come from?

Well, nothing is free. Once you’ve centralized all your logging to syslogd, then you have the problem of separating entries back out again according to source so your automation and log analysis tools can distinguish the logs of different gears from each other and from other components. This must be taken into account when directing logs to syslogd; the log entries must include enough identifying information to determine where they came from down to the level of granularity you care about.

We now give instructions for directing logs to syslog for OpenShift components too; take a look at the relevant sections of the Administration Guide for all of this. Redirecting logs from OpenShift components is fairly straightforward. There are separate places to configure if you want to use syslog from the broker rails application, the management console rails application, and the node platform. We don’t describe how to do this with MongoDB, ActiveMQ, or httpd, but those are standard components and should also be straightforward to configure as needed. Notably left out of the instructions at this point are instructions to syslog the httpd servers hosting the broker and console rails apps; but the main items of interest in those logs are error messages from the actual loading of the rails apps, which (fingers crossed) shouldn’t happen.

Notice that when configuring the node platform logging, there is an option to add “context” which is to say, the request ID and app/gear UUIDs if relevant. Adding the request ID allows connecting what happened on the node back to the broker API request that spawned the action on the node; previously this request ID was often shown in API error responses, but was only logged in the broker log. Logging the request ID with the logs for resulting node actions to the syslog  now makes it a lot easier to get the whole picture of what happened with a problem request, even if the gear was destroyed after the request failed.

Distinguishing gear logs

There are gear logs from two sources to be handled. First, we would like to collect the httpd access logs for the gears, which are generated by the node host httpd proxy (the “frontend”). Second, we would like to collect logs from the actual application servers running in each gear, whether they be httpd, Tomcat, MongoDB, or something else entirely.

Frontend access logs

These logs were already centralized into /var/log/httpd/openshift_log and included the app hostname as well as which backend address the request was proxied to. A single httpd option “OpenShiftFrontendSyslogEnabled” adds logging via “logger” which is the standard way to write to the syslog. Every entry is tagged with “openshift-node-frontend” to distinguish frontend access logs from any other httpd logs you might write.

With 2.1 the ability to look up and log the app and gear UUIDs is added. A single application may have multiple aliases, so it is hard to automatically collate all log entries for a single application. Also, an application could be destroyed and re-created with the same address, though it is technically a different app from OpenShift’s viewpoint. Also, the same application may have multiple gears, and those gears may come and go or be moved between hosts; the backend address for a gear could also be reused by a different gear after it has been destroyed.

In order to uniquely identify an application and its gears in the httpd logs for all time, OSE 2.1 introduces the “OpenShiftAnnotateFrontendAccessLog” option which adds the application and gear UUIDs as entries in the log messages. The application UUID is unique to an application for all time (another app created with exactly the same name will get a different UUID) and shared by all of its gears. The gear UUID is unique to each gear; note that the UUID (Universally Unique ID) is different from the gear UID (User ID) which is just a Linux user number and may be shared with many other gears. Scale an application down and back up, and even if the re-created gear has the same UID as a previous gear, it will have a different UUID. But note that if you move a gear between hosts, it retains its UUID.

If you want to automatically collect all of the frontend logs for an application from syslog, the way you should do it is to set the “OpenShiftAnnotateFrontendAccessLog” option and collect logs by Application UUID. Then your httpd log entries look like this:

Jun 10 14:43:59 vm openshift-node-frontend[6746]: 192.168.122.51 php-demo.openshift.example.com – – [10/Jun/2014:14:43:59 -0400] “HEAD / HTTP/1.1” 200 – “-” “curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2” (3480us) – 127.1.244.1:8080/ 53961099e659c55b08000102 53961099e659c55b08000102

The “openshift-node-frontend” tag is added to these syslog entries by logger (followed by the process ID which isn’t very useful here). The app and gear UUIDs are at the end there, after the backend address proxied to. The UUIDs will typically be equal in the frontend logs since the head gear in a scaled app gets the same UUID as the app; they would be different for secondary proxy gears in an HA app or if you directly requested any secondary gear by its DNS entry for some reason.

Gear application logs

In order to centralize application logs, it was necessary to standardize cartridge logging such that all logs go through a standard mechanism that can then be centrally configured. You might think this would just be syslog, but it was also a requirement that users should be able to keep their logs in their gear if so desired, and getting syslog to navigate all of the permissions necessary to lay down those log files with the right ownership proved difficult. So instead, all cartridges now must log via the new utility logshifter (our first released component written in “go” as far as I know). logshifter will just write logs to the gear app-root/logs directory by default, but it can also be configured (via /etc/openshift/logshifter.conf) to write to syslog. It can also be configured such that the end user can choose to override this and have logs written to gear files again (which may save them from having to navigate whatever logging service ends up handling syslogs when all they want to do is debug their app).

Here distinguishing between which gear is creating the log requires somewhat more drastic measures. We want to indicate which gear created each log entry, but we can’t trust each gear to self-report accurately (as opposed to spoofing the log traffic actually coming from another gear or something else entirely). So the context information is added by syslog itself via a custom rsyslog plugin, mmopenshift. Properly configuring this plugin requires an update to rsyslog version 7, which (to avoid conflicting with the version shipped in RHEL) is actually shipped in a separate RPM, rsyslog7. So to usefully consolidate gear logs into syslog really requires replacing your entire rsyslog with the newer one. This might seem extreme, but it’s actually not too bad.

Once this is done, any logs from an application can be directed to a central location and distinguished from other applications. This time the distinguishing characteristics are placed at the front of the log entry, e.g. for the app server entry corresponding to the frontend entry above:

2014-06-10T14:43:59.891285-04:00 vm php[2988]: app=php ns=demo appUuid=53961099e659c55b08000102 gearUuid=53961099e659c55b08000102 192.168.122.51 – – [10/Jun/2014:14:43:59 -0400] “HEAD / HTTP/1.1” 200 – “-” “curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2”

The example configuration in the manual directs these to a different log file, /var/log/openshift_gears. This log traffic could be directed to /var/log/messages like the default for everything else, or sent to a different destination entirely.

Gear metrics

Aside from just improving log administration capabilities, one of the motivations for these changes is to enable collection of arbitrary metrics from gears (see the metrics PEP for background). As of OSE 2.1, metrics are basically just implemented as log messages that begin with “type=metric”. These can be generated in a number of ways:

  • The application itself can actively generate log messages at any time; if your application framework provides a scheduler, just have it periodically output to stdout beginning with “type=metric” and logshifter will bundle these messages into the rest of your gear logs for analysis.
    • Edit 2014-06-25: Note that these have a different format and tag than the watchman-generated metrics described next, which appear under the “openshift-platform” tag and aren’t processed by the mmopenshift rsyslog plugin. So you may need to do some work to have your log analyzer consider these metrics.
  • Metrics can be passively generated in a run by the openshift-watchman service in a periodic node-wide run. This can generate metrics in several ways:
    • By default it generates standard metrics out of cgroups for every gear. These include RAM, CPU, and storage.
    • Each cartridge can indicate in its manifest that it supports metrics, in which case the bin/metrics script is executed and its output is logged as metrics. No standard cartridges shipped with OSE support metrics at this time, but custom cartridges could.
    • Each application can create a metrics action hook script in its git repo, which is executed with each watchman run and its output logged as metrics. This enables the application owner to add custom metrics per app.

It should be noted that the cartridge and action hook metrics scripts have a limited time to run, so that they can’t indefinitely block the metrics run for the rest of the gears on the node. All of this is configurable with watchman settings in node.conf. Also it should be noted that watchman-generated logs are tagged with “openshift-platform” e.g.:

Jun 10 16:25:39 vm openshift-platform[29398]: type=metric appName=php6 gear=53961099e659c55b08000102 app=53961099e659c55b08000102 ns=demo quota.blocks.used=988 quota.blocks.limit=1048576 quota.files.used=229 quota.files.limit=80000

The example rsyslog7 and openshift-watchman configuration will route watchman-generated entries differently from application-server entries since the app UUID parameter is specified differently (“app=” vs “appUuid=”). This is all very configurable.

I am currently working on installer options to enable these centralized logging options as sanely as possible.