Palo Alto OpenStack Operators Meetup

August 19, 2015

I am really early for my flight home to Edmonton from the San Franciso airport. Maybe I didn’t need to be as concerned about traffic as I was. I bought The Martian book so I should be reading it, but I’m not, instead I’m writing this blog post about the Palo Alto OpenStack Operators mid-cycle meetup that I attended.

If you want to see the full, official notes from each session, you can find links to them from this etherpad page.

My notes from the meetup

Here are some general notes I made from the sessions. Sorry the below is rambling, but that’s how I think. Also these are my interpretations of what was said, and I’ve been wrong before. If you notice something incorrect, just let me know in the comments and I’ll change it.

The vast, vast majority of operators have moved off of nova-network and are on Neutron now. Only one person had their hand up for running nova-network, though I’m sure there are more, they just didn’t respond to the informal poll. I think it’s good that Neutron is now the standard. Neutron is more complicated than nova-network and doesn’t necessarily meet everyones needs, but I think having a common base will help get things moving forward. The large deployments team is working on helping the Neutron team create a model that works for them, and I think once that happens everyone will be happy. For the record my neutron “model” is to use Midonet with VXLAN. I really like Midonet and the overlay model fits our customer base well. Several large deployments prefer straight up L3 based networks, and I’m sure Neutron will provide that model soon. That said, it will be hard for Neutron to meet the requirements of every single network model.
RabbitMQ isn’t as much of a burning issue as it used to be. Seems like people still have no idea whether to run it behind a load balancer or not, or to list all the rabbit’s in the config, or what. Still quite confusing. However, a definite consensus is that Kilo is much better with rabbit. I’d like to backport some of the things that make it better as we are still on Juno, but I’m not sure we’ll have time for that. Soon we will upgrade from Juno to Kilo anyways. Another point made was that you probably have issues whether you use a loadbalancer or not, so either way…
Also–someone from RabbitMQ was at the meeting and gave an update. They are working on making RabbitMQ easier to operate. They are also keenly aware of how much it is used in OpenStack, and may provide OpenStack-specific documentation on using RabbitMQ. That would be great.
CMDB was a big topic at this meetup. Lots of people want a good CMDB so that they can start to do capacity planning. Obviously this means there are many large OpenStack installations out there.
HP talked about their billing solution. It’s fairly complex…as far as I can remember it pulls information off of various sources, such as rabbit, sticks them into Hadoop, there are jobs that crunch that data and store it into an SQL database which is fronted by an API that internal systems can query for billing. They intend to opensource this system in some fashion, or maybe just parts of it. Not sure.
Burning issues: The etherpad can be found here.
Many, many hands were put up for Neutron as a “burning issue,” but when the time came around to actually complain about it, no one really said anything, or perhaps I just missed it. My impression is that Neutron is working pretty well for most.
Ceilometer is still an issue, but people are also still using it, and many are investing heavily in getting it working. I think Ceilometer is an important project and I feel like eventually it’ll be production worthy. I keep forgetting Heat can use it for autoscaling. We stopped using it due to performance issues on the backend (like many others) but we also didn’t put that much effort into it. Many other metrics gathering systems are resource intensive, and also use backends like Mongo. If you throw too much data at a backend it stops working, no matter what technology you’re using. One provider mentioned using Influxdb and ceilometer, I believe with statsd as an intermediary.
Keystone was also listed fairly high in terms of burning issues. Many clouds would like to flexible roles, as basically there are only three right now: admin, member, and no role. As usual the policy json files were mentioned as being complex and somewhat undocumented. I’m personally not sure how much effort to put into the policy files if there are only three roles. Something to look into. The important thing to know is that the Keystone project is well aware of these requirements and they are working on them. Also, many clouds have added an extra layer on top of Keystone to allow some tenants to do admin-like functions, such as add users to their project and things like that. But that functionality is “hacked” in, so to speak by the providers.
I didn’t get to ask my question about putting Keystone tokens in memcached. I read a bug a while ago where it was mentioned that the memcached driver being used is not that great, and that using memcached isn’t recommended, even though it’s part of the official install docs (or was the last time I looked). Darned if I can find that bug listing again. Several providers have moved to Fernet tokens which don’t need to be stored, though are slower. I would imagine that once we get to Kilo that we will use Fernet tokens as well and just avoid the whole caching/storage issue.
Containers: There was some discussion about containers. It seemed forced. Some people use containers in their OpenStack infrastructure…LXC, Docker. I will probably use so-called “fat” containers, ie. LXC, for some infrastructure. Yeah, containers are still hot. Docker is cool. Multi-tenancy is hard, etc, etc.
Install guide: There was at least one mention of how a particular technology was used in someones OpenStack cloud because it was a step in the install guide. I think the install guide is a lot more important than people know in terms of how people deploy OpenStack. If memcached is listed as an install step then people will use it. If OVS is mentioned in the install guide then people will use it (as opposed to Linux bridge, which large operators seem to prefer, at least at this time).
Many operators use MySQL/MariaDB and Galera. Most using this combination, if not all, only write to one node. Usually a virtual IP points to one node and clients read and write to it. I think one operator had managed to get read-only APIs up and running and those work against the read nodes of a Galera cluster. That would be something I would like to do, at least send all reads out to any node, but writes to one node. It was noted that CERN does this for Ceilometer to make sure it doesn’t overload the write APIs. I’ve been looking at some MySQL proxies that can do this, but maybe it’s just extra complexity that I don’t need.
Everyone loves haproxy. I quite like haproxy myself. Some ops prefer it over the $$$ commercial solution they are also using. It’s very powerful and is working well for me, and I, like others, terminate SSL with it.
Most operators have to do some database cleanup. I’m not clear on all the things that need to be cleaned up. There is at least one project, ospurge, that might help get some of the way towards all the cleanup jobs that need to happen. 80% of the time it’ll get you 60% of the way.
Every summit and operator meetup has a call for a place to share tools. Every summit and meetup osops is mentioned and that is where the conversation ends. osops has been around for quite some time but there aren’t many contributions to it. One of the great things about openstack is how you can run it in almost any way you want, but that also means that every openstack install is different, and it makes it somewhat hard to share tools…which is why I think this repo doesn’t get much love.
There was talk about sharing Grafana configs. Then there was the mention that the new Kibana (not Grafana, Kibana) makes it difficult to import/export configs. I’m doing some testing with Grafana and Influxdb for metrics, so that would be great to at least see what others are graphing, if not actually use those graphs.
Dragonflow was mentioned as a very new project. I don’t know anything about it other than what is says on at the github repo: “Dragonflow - SDN based Distributed Virtual Router for OpenStack Neutron.” Even though I’m not an expert at networking, I am fascinated by SDN.

Public cloud

There were also some concrete steps towards getting a public cloud group going in/around OpenStack. I think this is great and hope to participate as much as possible. There may not be many public clouds based on OpenStack, and some are much larger than others, but I think this is an important sub-group of operators, and is not necessarily the same as the large deployments team. Public clouds definitely have some specific requirements.

Short trip

For me it was a short trip. We were in late Monday night and had to leave about 3PM on the Wednesday, so we missed some of the tail end of the meetup, mostly around Tokyo planning, which was unfortunate but unavoidable due to California traffic and flight times.

Thanks to HP and Godaddy as sponsors and for all the work put into the meetup by the foundation and volunteers. While I’m not a big fan of the Palo Alto [1] location, the meetup certainly met its goal of enabling the exchange of ideas and practical OpenStack experiences.

If you’re wondering, my flight back was great because there was an extra seat beside me, and it’s only a 2.5 hour trip back to sunny Edmonton from overcast SF0. :) Also I read almost all of “The Martian.” If you like NASA then you will read this book in one sitting, it’s a fascinating and thrilling read with tons of science.

1: Palo Alto hotels are expensive and you pretty much have to rent a car. It’s great that companies are willing to sponsor the meetup with a location and food and such, but it seemed to me like the costs were just transfered to hotels and rental cars. Then again, a large number of attendees were from California, so maybe it all evens out.