17 July 2017

Experimenting with Phalcon on Docker

Phalcon is a PHP framework that is implemented as a web server extension written in Zephir and C, aiming to boost execution speed, reduce resource usage, and handle more HTTP requests per second than comparable frameworks written primarily in PHP.

Phalcon logo
Is it really that much faster?

According to the benchmarks produced by the Phalcon team (here) it is significantly faster than other frameworks.  Of course there are other benchmarks available (like this one) which also find that it is significantly faster than traditional MVC frameworks.

Since the framework is and extension built in Zephir it's extremely fast and efficient.  Zephir offers native code generation (currently via compilation to C) which makes running it a lot faster than running PHP.

Phalcon has consistently been at the top of the performance charts for several years.

It's definitely worth experimenting with, and luckily they provide a full Docker stack that lets you get up and running in no time.

Of course it's important for me that I'm able to use Blackfire in my development environment so that I can profile it and measure performance changes.

To do this I used the instructions provided by Blackfire (here) and made just a few modifications.

To start with I amended the Phalcon docker file in docker/app/Dockerfile to include the command to install the Blackfire extension, like this:

Then I added the configuration into conf/php/fpm.ini like this:
And then of course I needed to include the Blackfire docker service, so I added it into the docker-compose.yml file like this:

Now when I bring up my Docker stack I have a fully functional Phalcon installation and my Blackfire profiling works right out of the box.
Needless to say that I'm very happy with the performance that the Phalcon "Micro" App is showing.

29 June 2017

Context and caching

Image: Pixabay "Storage"
In this post I'm going to take a look at how we can improve performance by avoiding having our code run at all.

Lets assume that we have users from different parts of the world accessing our application.  We've decided to use a CDN to deploy so that people who are a long way from our server are not subjected to waiting for the network.  The CDN will be caching requests at the edge locations close to users.  It will only
pass a request to our load-balancer if the request cannot be served out of its

Having the CDN be able to respond to the user request is the best case scenario because the network distance is reduced and our infrastructure doesn't have to worry about the request at all.  The user in Hong-Kong will have their response
just as quickly as a user in London where we're hosting our server.

If we were not using a CDN we would probably want to use a caching load balancer like Varnish or Nginx, but in our case lets assume that the load-balancer just passes the request on to a server.

The request reaches a web-server which will also decide whether or not it can respond with a cached result.  If our web-server is able to respond without hitting PHP we're still doing okay for performance because web servers are pretty quick off the mark at returning cached content.  If our web server is struggling to return cached content fast enough we can just stick it onto a
bigger server or add more servers to our cluster.  In other words if we can have it so that most of our requests reaching our webserver do not reach PHP it makes scaling a lot more simple.

If the web-server cannot serve the request out of cache then it will need to pass it on to a PHP-FPM worker.  The PHP worker will read the bytecode it needs to run the program out OpCache, or it will read the file(s) off disk and compile them.  It will run the program and see if it can read parts of what it needs out of our Redis/Memcached cache store.  If it can't then it will make queries against the database.

Our caches are intended to avoid the expense of I/O operations (network and disk) and processing time (database and PHP processing).  The closer we can cache to the user the more we can save.

The problem with context

The context of a program is the minimal set of data that you need to store in order to be able to resume processing in case it is interrupted.  It's what you need to know to be able to build up the state of the application.

If I know the context of an application then I should be able to interrupt it and later rebuild it's state and allow it to continue running.  If I do so I should expect the same results that I would have had if I just left it to run.

So why should context be a problem?  Surely the ability to predict what code does allows us to do important things like writing test assertions and debugging?

In the PHP world a program's initial context comes from input and your session.   Input can be supplied through your request variables, and through cookies. We run through our code and come out with an entirely predictable and (most importantly) cacheable result.

A program that is run twice with the exact same initial context should produce the same result each time.  This means that you can confidently store the result of running the program knowing full well that the next time the program is run with that particular context you can just return the result
that you already know.

This leads to the observation that you can only cache the output of your code within the context in which it was run.

Practical example

Lets imagine an API that a client calls to list upcoming calendar events.  The client is displaying an events calendar to a user.

Because we're such nice people we've added lots of functionality to our API to make life easier for the client.  We've allowed the client to specify a starting date and an ending date so that they can display a week or a month at a time. We're also allowing the client to specify an event category so that it can  present a drop-down box for the user.  Lastly, we allow the client to specify
a sort order that they would like to receive their data in.

A call to our API might look something like this:


When the call starts it has five variables in its context.  If we cache this call in Varnish/Nginx/CDN then we'll be caching the result of calling our API with an extremely specific context.

If another person wanted to see events for the same date range, same sort order, but for a different category we would need to run the PHP application and cache that response. Even a small change in the initial context means that we can't use our cache at all.

If a third user wanted to see exactly the same information as either of the other users but wanted to see it in ascending order we would need to generate the results for them specifically.

You can see that as you add users you're going to have an increasing number of possible combinations.  The fact that we're allowing arbitrary dates to be selected makes this especially problematic.

With the possibility of so many different contexts we can be pretty sure that we're going to be missing our outer HTTP caches a whole lot.

Load balancing between your server and the users device

We can write very powerful API endpoints that allow a lot of flexibility.  In our example the client would be able to get the information it needs in the format that it needs because we're doing all of the work in PHP on the server.

What if we were to delegate some of the processing responsibility over to the user device that is consuming the API?

If we could get our client to receive an unformatted list of data that they then sort and slice into the format that they need then we wouldn't need to worry about doing that on the server.

More importantly, and this is the crux, we could reduce the variability of the context in which transactions occur.  This makes it much easier to cache our results in our edge caches and avoid having our PHP run at all.

By asking the client to take on the load of transforming the data we're allowing ourselves to respond in a generic way to the request.  The client is now responsible for managing the context of the application which frees up the server to work on managing data.
This is why frameworks like ReactJS and Angular are so powerful for the web,
they make it possible to manage context for complex applications in the client.
So instead of offering all of the parameters that it previously did we could refactor our API to only allow the user to specify the number of the week for which it wants results.  The backend will calculate the date that the week starts on and return events for that week.

Obviously you would need to look at your use case when trying to generalize your API.  If your clients typically need a month at a time then sending them a month worth of data at a time might be better than asking them to make four or five requests of a week each.  If the amount of data being transferred is overly big you might look at splitting your API into an endpoint that gives a list of
events and another endpoint that gives details of events.

Cookies and caching

Both Nginx and Varnish will, in the default configuration, not cache a object coming from the backend with a Set-Cookie header present.  It is possible to override the setting, but it's preferable to rather avoid using cookies at all.

Why avoid cookies?  The short answer is that cookies are associated with context because they're being used either to provide input parameters or to indicate session variables.

Cookies are the traditional way to store a user's session identifier.  We then use the session information to identify the user who authenticated themselves previously.  Often frameworks that use cookie based sessions (like Laravel) will also set a cookie for a user who has not been authenticated.  This allows us to maintain session state for the user on the server.

The purpose of a cookie is to associate a request with a particular session state.  In other words the cookie is an indication that you're using session stores.  If you're confident that your responses can be cached because there is no context coming from the cookie or session then you can configure your caches to ignore them.

The outer caches will see that we're attaching a cookie to our response and will assume that we intend to use that cookie in the future.  If you are confident that the cookie is not part of the program context you can instruct the cache to disregard the cookie and store the response regardless.

Can a RESTful API use cookies?

There is a lot of argument about whether cookies are "RESTful" and I think it
really boils down to what you're storing in the session that the cookie
identifies.  Using JWT also requires you to use an HTTP header and a cookie
is also just an HTTP header.  I can't see any fundamental difference when
it comes to using either method to identify a user, but if you're storing
state other than user authentication in the session then that could be seen
as problematic for being RESTful.

Another reason to avoid cookies is because browsers are just one sort of client that you can expect to be using your API.  You might find that cookie based user identification is very inconvenient for somebody writing a Java application to run on Android.

If you are going to use cookies then you'll need to look at making sure that your edge caches know when to disregard them and cache a response regardless of it having a cookie attached.  To me this feels like more work than avoiding them entirely.

If you're using cookies to manage sessions then consider swapping to a different way to authenticate your user.  JWT tokens are very convenient and will let you avoid needing to worry about reconfiguring your caches.  So how can we avoid context?

Avoid session variables

You do not want to persist session state between calls.  The session state forms part of your context and will affect the results of running your program.  This complicates your caching, because you can only cache within the entire context of your transaction.  Your cached results will only be meaningful to the specific case where the user has that exact session state.

The context of your API needs to be limited to the input parameters that it gets from the HTTP request.  Let the client maintain the state of the application and let your API server worry about supplying the data that the client requests.

Dumb down your API

Don't have your API accept every parameter under the sun and serve up responses that are specific to an application context.  Let your API provide information that can be reused in different application states and cached on the edges of your network.

You might need to rewrite portions of your API and update your clients that consume them.  The performance gains you obtain from this will be significant and will dwarf any efforts you make to optimize your code.  Not running your code at all will always be faster than running optimized code.

The effect of dumbing down your API is to limit the variability of the request parameters passed to it.  By limiting the variability you increase the chance that requests will match a previously cached version.


The entire aim of a RESTful API is to maintain stateless operations.

Scaleability is one of the big wins of having a RESTful API.  Sure you'll enjoy other benefits, but for the purposes of this chapter if you make your API RESTful then you'll avoid context and allow your responses to be cached.

There's a lot of focus on REST with regards to verbs and structuring your endpoints.  All of that is good but when it comes to performance don't lose sight of the goal of stateless operations.

24 June 2017

Gleaming the scale cube

Image: Pixabay "cubism"
This concept was introduced in a book called "The art of scalability" and it's a very useful way of looking at your application.  It encourages you to picture our application capacity as a cube. In this analogy, if we increase the width, height, or breadth of the cube then we increase our application capacity.

It's particularly useful to consider while you're modelling your domain with a view to decomposing a monolithic application.  The book by Eric Evans "Domain Driven Design" describes the concept of bounded contexts and this plays very nicely into the scale cube concept.

X-scaling, or horizontal scaling

The first dimension we're going to look at is horizontal scaling. This is where we make our cube longer along the X-axis.

When we scale horizontally we create multiple copies of the application and then use a load-balancer to split up the traffic between them.

This is a very easy way to scale your application but it does have some limitations. Firstly it is costly to add more compute to your stack and at some point it won't make financial sense for you to add a new server to your cluster. Secondly, this approach does not address the architecture of your application and as it grows bigger and bigger it becomes harder and harder to manage its performance.

Load balancer
Horizontal scaling is cheap and easy in the early stages but offers a diminishing return that means you'll hit a limit. You can extend that limit and increase the amount of return by using the other two scaling methods in conjunction with horizontal scaling.


Y-scaling is the act of splitting your application into multiple components, each of which is responsible for a set of related functions. This can be called modular programming because it breaks your monolithic application into smaller modules which work together to offer the application functionality.

Data sharding
You might not see how this can help your application scale, but in general a small piece of code that is focused on one thing will do that thing quicker than a monolithic application that tries to do everything.

By having your application split into smaller components you'll be able to perform horizontal scaling at a more granular level. You'll find that not all parts of your application are equally busy. For example a feature that lets users automatically tweet what music they're listening to might not be used nearly as much as a recommendation engine that suggests music to them.

Instead of having to create a new copy of your entire application you'll be able to create new instances of the individual components that are experiencing load. This improves the cost efficiency of horizontal scaling and extends the limit to which you can afford to scale up.


Z-scaling involves splitting up responsibility for different parts of the data. You deploy multiple copies of your application, but each one becomes responsible for just a part of the data set. In other words you partition your data and then assign a server to a particular group of your data.

For example, you could set up three different servers each of which is devoted to serving only certain customers based on their surname. When a request arrives at your application your router will determine which of the servers is intended for that user and send the request accordingly.

If you compare this to X-scaling you'll notice a similarity in that you have deployed multiple identical copies of your application. The difference is that now instead of having a load balancer send traffic to any server you will be consistently routing traffic to a particular server based on your sharding rules.

There are several downsides to Z-scaling. It doesn't help with reducing your application complexity because a full version is still deployed on each server. In fact it's probably going to make your application more complicated to understand because you have the additional problem of having your data in pieces.

The advantage of sharding is that it gives you another way to tune the application stack to match the load. It also has the effect of isolating failure to one part of your application; If a shard goes down then traffic that is destined for other shards will continue to flow. This improves the fault tolerance of your architecture.

Bringing it together

The scale cube is useful for looking at different dimensions and opportunities for scaling. Horizontal scaling is the cheapest option initially but offers diminishing returns and eventually will no longer be an option. The other two dimensions are more expensive to perform but will be necessary for you to offer a truly scalable solution.

16 June 2017

10,000 simultaneous visitors on a €4 server

Image: Pixabay
I thought it would be useful to show how effective low cost solutions can be at scaling up.  Even if you don't have a huge budget you can still have a site that can handle respectable volumes of traffic.

Of course 10,000 simultaneous requests would correspond to a much higher number of visitors on your site.  When I'm generating traffic I'm not trying to simulate how long a person views a single page before requesting the next one.  On most sites at any given time most visitors will be looking at content rather than requesting new content.

To show how to get 10,000 concurrent requests on a cheap server I'm going to use a very cheap VM and hit it with requests from Loader.io. The exact specifications of the VM are on the companies website (here).

It's a very basic single core, 1 GB RAM virtual machine on shared tenancy.  It costs me four Euro a month and it runs my book website and BOINC software in its spare time.  So as far as web-servers go this is a very humble one and it's even being forced to donate its spare compute cycles to help cure cancer and find evidence of neutron stars.

It's running Nginx as a web-server with a tuned instance of PHP7.1-FPM serving up PHP pages.  Nginx serves static assets itself and sets all of the appropriate headers that let devices know they can cache them.

For this particular site the homepage is static and doesn't change for different users, so at the top I'm using PHP to set cache control headers, like this:

This indicates to Nginx and other intermediary caches that it's safe to cache the content and give it to any user who asks.  I set an ETag that is based on a static version number because the only time I would want to process the file is if the version changes.

In my Nginx config I'm setting up fastcgi_cache so that I respect the cache control header and only send requests to PHP if my cache is old.  With this setup Nginx will be serving the majority of requests directly and passing very little (if anything) to PHP.

Nginx is a very capable server and on my limited and overworked VM is able to serve up about 1,000 concurrent page requests before it starts having difficulty and giving timeouts.  This is far in excess of the traffic on that site to be, so there really is no need to scale other than to see how far I can go.

I decided to increase how aggressively Cloudflare caches my site.  It's a static page so there's no good reason not to cache it as much as possible on the edge locations right?

To do this I go into Cloudflare and set up a special page rule that will override the default way that Cloudflare caches stuff. This rule will force it to cache everything on my site. I can only do this because the site is static and doesn't change for different users.

Next I open up Loader and set up a test. I'm going to ramp up from 1000 users all the way to 10,000 concurrent connections.  The Loader graph shows increasing the number of visitors did very little to the response time.

Even with 10,000 users hitting the site I'm responding in under 800 milliseconds.  Well of course actually I'm not - the CDN is returning all of the data so my webserver isn't being hit at all.

Of course this is a special case and not every page is static and able to be shared by all of your users.  More advanced rules would let you set more intelligent timeouts and identify which parts of your system are able to be shared between users.  This is a very basic website, however, so we can get away with a lot less.

Here's a video of this in action:

05 June 2017

Using and tuning Opcache

Image: Pixabay
PHP is an interpreted language.  This means that there is an engine which reads
your PHP and converts it to an internal format called opcode.  The engine is
then able to execute the opcode directly.

This is done when the script is run and although compiling is very fast it
still requires some computation and possibly for the file to be read off disk.
PHP uses a single-request-per-script-execution model and so in normal operation
every time your program runs it would need to be converted to opcode.

An opcode cache is one which stores the prepared opcode for a PHP file which
saves the PHP engine from having to interpret the file every time it runs.

There are several different opcocde caches and the reason I’m focusing on
OpCache is because it’s baked into default installations of current versions of
the most popular implementation of PHP and does not need the user to
install anything.

Tuning opcache

Now that we have some context about opcache and what it does lets look at
how it can be tuned.  We want to understand the trade-offs and constraints
involved in changing the settings.

You can configure the amount of memory it has available and also the number of
files that it is able to store.  There's also a setting that controls the
maximum amount of wasted memory that Opcache will tolerate.  We'll be going
into all of these in more detail.

The default values for Opcache are actually a little bit low if you're using a
framework or a lot of vendor libraries.  You should also remember that some
templating systems generate code for your views so there might be PHP files in
your application that aren't in your source control.

The aim of tuning Opcache is to avoid having it become full and to avoid having
it restart itself.  We'll look into the details of this shortly but when we're
tuning performance we'll be working within the constraints set out by these

How does Opcache decide that it is "full"?

Opcache considers itself "full" when it is unable to store more files.  It
reaches this limit either when it runs out of memory or runs out of storage
slots, or keys.

You limit the memory available to Opcache with the `memory_consumption` setting
and the number of keys or storage slots with the `max_wasted_percentage` setting.

Recycling the cache

Opcache isn't alone in this problem and other cache systems can also become
full.  Often a cache will use a system to determine which keys to discard when
it gets full.  For example, it might decide that "least recently used" piece of
data it is storing should be discarded so that it can store new information.

There are lots of these strategies but Opcache doesn't use any of them and it
never replaces anything in the cache with new information.  The only way in
which Opcache ever replaces information in its cache is to empty it out and
start from scratch.  It throws away all of its cached information and forces
PHP to recompile the files so that it can cache the results again.

So what happens if a file on disk changes?  Will Opcache remove the old entry
from the cache and rather store the new version?

No.  It won't.  It will cache the new version and mark the memory being used
to store the old version as "wasted".  The old version will still be using a slot
in the maximum number of keys Opcache can store and will still be using up
memory allocated to Opcache.

Once Opcache is full it will check the amount of wasted memory and compare it
against the configured `max_wasted_percentage` setting.

If Opcache is wasting less memory than this percentage then it will not restart.
Files that are not in the cache will never be cached and will always be compiled
when they are required.

Detecting when files change

By default Opcache is configured to make sure that the file has not changed
on disk before it returns the cached opcode.  This is great if you expect your
files to be changing, but is probably not the best option for production.

The `validate_timestamps` setting can be set to zero to prevent Opcache from
ever checking disk to see if the file has changed.  This saves on disk I/O and
any saving on any sort of I/O is always a good thing.

Of course this leads to the problem of needing to clear out your cache when you
deploy new code.  The most common solution I've seen to this is to restart the
PHP-FPM worker.  This results in Opcache being emptied out and any current
requests being terminated.  This might be fine if you don't work in a shop
that does continuous deployment and is able to schedule deployments for times
when your site is not so busy.

How could we avoid the situation where we dump all of the people using our site
and start our cache cold?

One way to work around this is to point your web-server document root to a new
location and reload its configuration.  The web-server and PHP-FPM
processes are different and so you're able to change the web-server configuration
without affecting PHP-FPM.

By doing this your web-server will start up new workers for new requests and
not terminate the workers that are busy finishing up the requests that are
currently running.

This still doesn't solve the problem of needing to clear out Opcache so that
your freshly deployed code is used instead of the code in the cache.  Luckily
we are not forced to choose between always checking for a new version or never
checking.  We're able to tell Opcache to check for a new version on disk at
a configurable interval.  When we deploy code then once that time has passed
Opcache will go ahead and cache the new version.

Of course this doesn't mean that Opcache won't need to restart.  The amount of
wasted memory will increase when the new files are cached and you'll be
filling up your memory and keys quicker than you would otherwise.

I think that's why most of the time people just restart Opcache when they
deploy new code.  It's the most simple approach and if you do it during
scheduled downtime your users won't complain as much.

Finding your opcache settings

By default the Opcache settings are contained in your php.ini file.  This might
be different in your distribution.  Remember that you can go to your shell
and use the command `php --ini` to get a list of all of the ini files that PHP
is loading.  You can pass that list into grep to find all mentions of `opcache`
to narrow the list down.

Profiling Opcache

The only way to get a look into OpCache's running state is to use PHP functions
like opcache_get_status().  This function returns an array that contains a
fair amount of detail about the state of Opcache.

It is slightly inconvenient that you need to run PHP in order to view the
state of your Opcache.  It means that you need to have your webserver serve up
a page that makes these internal calls for you.  Naturally you'll configure
Nginx to only allow access to that page only for requests from your IP but it's
still a bit annoying that you have to do this.

Opcache is shared between all of your worker pools.  I'm assuming that because
you're reading a book about scaling you're not sharing your server with other
users.  In the event that there are other websites on the server you should
double check that your host has configured things properly.  By default any
PHP script is able to access the entire cache and so is able to read your

For us the most important aspect of Opcache being shared between all worker
pools is that when we allocate memory we're sharing that between everything.
In other words if you allocate 128MB of RAM to Opcache then that is the total
to be shared and not the amount per worker or pool.

Right, so we know that Opcache is shared between all the workers in all the
pools.  This means that we don't have to use our actual application to profile
Opcache.  As long as a PHP script is run in PHP-FPM it will be able to tell us
how Opcache is working for all of the other workers.

The easiest way to do this is to upload a script written by somebody else to
your server.  There are several on Github but my favourite one was written by
Rasmus Lerdorf.  It's my favourite because it's a single file application which
I find easy to drop onto a server without affecting much else.  

All of these solutions could potentially leak sensitive information about your
PHP installation.  They require you to add security to make sure that the
endpoint is not accessible to the public.  This is just another thing to go

An alternative is to rather use Zabbix to monitor Opcache.  Zabbix doesn't have
access to PHP so it cannot monitor Opcache directly, but it is extremely
flexible in allowing custom data imports.  You can very easily set up a PHP
script that outputs the Opcache data into a format that Zabbix can ingest.

This alternative is neater because it adds Opcache to your central server
monitoring and is less likely to vomit information about your setup to the

You'll find pre-built solutions for this on Github

In summary

Lets quickly sum up what we've been through.  We've learned that Opcache
improves the speed of your application by storing a compiled version of it
in memory.

We've learned about three settings that are most important in

  1. `memory_consumption` - which limits how much memory Opcache can use
  2. `max_accelerated_files` - which limits how many different files or versions of files Opcache can store
  3. `max_wasted_percentage` - which helps Opcache to decide whether it should restart when it is full, or rather force new files to get compiled whenever they're called.

We also looked at the `validate_timestamps` setting.  Setting this to zero
means that PHP won't need to make disk requests to see if a file has been
updated, but this can cause issues with deployment.

02 June 2017

Logging at scale in PHP

Image: Pixabay
Logging can be an expensive operation if you're writing to disk.  It's probably not a problem for your application at low volumes, but as you start having even just tens of thousands of users on your site at the same time the I/O becomes expensive.

An obvious way to help reduce the I/O is to turn debug logging off when you're in production.  You only want to log messages that you need to keep and not a full trace of every persons visit to the site.  This does reduce the logs usefulness if you do later try to use them to debug a problem.

If you're using the Monolog package then you should know about the "fingers crossed" handler.  In this mode of operation Monolog will only output debug log lines if there is a warning or error log too.  This lets you log less stuff in general cases but still have full debug logs in the event of a problem.

🛈 Laravel and Symfony both use Monolog for their logging.  

Another problem that emerges when you start scaling your application is that inevitably you're going to be running your application in more than one place.  How do you sensibly log when a users logs are split up over several different servers or in multiple service containers?

The answer is to centralize your logs.  By sending your logs to one central logging service you'll be able to make sense of them.  Another advantage to this sort of service is that they'll offer functionality like being able to search and aggregate the logs.

The key point to consider when you're looking at centralizing your logging is how to get your logs to it.  You do not want your PHP application to need to contact other servers to send logs!

Linux already has the rsyslog program which offers a way to asynchronously send logs to another server.  Any solution that you decide on should either use this service or offer the same functionality.

Setting it up is very simple, you configure PHP to write logs to "syslog" instead of to a file.  Then you configure rsyslog to forward your syslog on to the other service.  Most of the logging platforms will have scripts to help you do this, but it is really very easy to set up.

For example, adding this line to your rsyslog configuration will cause it to forward all of your syslogs to a Graylog host over UDP:

    *.* @graylog.example.org:514;RSYSLOG_SyslogProtocol23Format

There are several commercial logging systems like Loggly and Splunk.  These are great options because you won't need to worry about maintaining the logging service and can just use it as a paid service.

If you would prefer to use your own servers then Graylog is an excellent open source alternative.

06 May 2016

CloudFlare CDN

CloudFlare offers a free tier CDN and I've been very impressed by the value they give away for nothing.  I think if I were to recommend a CDN to start with then CloudFlare is a great choice.  They have easily accessible documentation, an easy to use panel, and offer a range of security features not offered by Amazon.

CloudFlare network map
As with Amazon it's very easy to set up SSL.  CloudFlare will give you a free certificate for your site that will protect browsers connecting to the edge locations.  If you want to encrypt the traffic from CloudFront to your origin server you can install an SSL certificate onto your server (provided free by CloudFlare or roll your own).

CloudFlare offers a web application firewall and you can buy pen-testing services through its marketplace.

One potential drawback of using CloudFlare is that it requires you to set your domain nameservers to point to CloudFlare providers.  This can be helpful because scalable DNS is essential but personally I prefer to use Google Cloud DNS or Amazon Route 53 to provide large volume DNS servers.