RabbitMq failure of Rabbitmqctl after previously working

The problem

Ok so we modified one of our applications to replace ActiveMq with the much more reliable RabbitMq.  During this process I was configuring clustering, adding new users, and virtual hosts, all with scripts that made use of rabbitmqctl.  Therefore, rabbitmqctl was working correctly.  I would note that this was on Windows 2008 and this problem will not likely affect Linux.

So I RDP’ed into the Windows box the next day and was going to configure some scripts to pull queue size stats out of RabbitMq for our Kibana reporting environment.  This method of getting the queue size uses RabbitMqctl from another batch file.  On the first run I noticed that the script wasn’t working.  Digging further I noticed that if I issued the command:

rabbitmqctl status

Then I received the response

Status of node ‘rabbit@SVR1’ …
Error: unable to connect to node ‘rabbit@SVR1’: nodedown

DIAGNOSTICS
===========

attempted to contact: [‘rabbit@SVR1’]

rabbit@SVR1:
* connected to epmd (port 4369) on SVR1
* epmd reports: node ‘rabbit’ not running at all
no other nodes on SVR1
* suggestion: start the node

current node details:
– node name: ‘rabbitmqctl23333@SVR1’
– home dir: C:\Users\me
– cookie hash: <deleted>

Very strange as this same command had been working the previous day.  Now I know that the command was working from my account yesterday, I also know that I have the correct Erlang cookie in my home folder.  So this should really be working.

After getting some help from the super helpful Simon MacNullen on the RabbitMq mailing list I identified that the following is the problem.

I had installed RabbitMq, then with the Rabbit service not running I issued the command “rabbitmqctl status”.  This caused Erlang to start the epmd.exe process but note that this was running under my user account.  I then started the RabbitMq service, which registered itself with the running epmd.

See here for more information on the epmd daemon: http://www.erlang.org/doc/man/epmd.html

This daemon is used when interacting with Rabbit using Rabbitmqctl, but it is also used for configuring clustering.

So at first rabbitmqctl and epmd worked fine, but because the process was running under my user account, when I logged off the epmd process was killed.  When I logged back in therefore, RabbitMqctl no longer worked.

I would also like to notify people that my tests show that if Rabbit is not registered with epmd correctly, then clustering will not work.

For instance, if I create the same situation on the master for the cluster, then restart one of the slave servers, the slave will not be able to connect to the cluster and will fail to start.

Simon indicates that rabbitmqctl, clustering and rabbitmq-plugins in 3.4.0+ will not work if Rabbit is not registered with epmd.

So this is quite a serious problem as it breaks clustering meaning that on production if we had not noticed this and the slave service was bounced, then it would not have come back up.  Now we have monitoring for the service not starting but it still wouldn’t be much fun.

The exact steps to reproduce the problem are:

  1. With no epmd running.
  2. Stop Rabbit service.
  3. Run rabbitmqctl status, this starts epmd as your local user account.
  4. Start Rabbit service
  5. Run rabbitmqctl status, notice that it works.
  6. Log off
  7. Log back in
  8. epmd has exited due to being killed during logoff
  9. Run rabbitmqctl status, notice that it no longer works.

The fix

If you have a downtime window.  Kill any running epmd executables and restart the RabbitMq service.  If like us you don’t have that luxury then follow the steps below.

Thanks again to Simon for suggesting the fix and Sysinternals for their tools.

  1. Download the Sysinternals pstools from here: http://technet.microsoft.com/en-gb/sysinternals/bb896649.aspx
  2. Extract psexec and copy it to the destination server.
  3. Kill any running epmd.exe
  4. Start an Administrator command line
  5. Run the command psexec -s “c:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.3.5\sbin\rabbitmqctl.bat” status
    1. You might obviously need to adjust the path to you installed Rabbit.
  6. Check using ProcessExplorer that epmd.exe is running as the user: NT Authority\System
  7. The first time you do this you might need to click through a license agreement for the Sysinternals tool.  This might cause it to fail but repeat steps 3-6 again.
  8. Start a second command line window
  9. In the second window run the Erlang shell: “c:\Program Files\erl6.2\bin\erl.exe”
    1. Again use the path to your installed verison.
  10. In the Erlang shell enter:
    1. erl_epmd:start().
      1. This should return {ok, SomeProcessIdentifier}
    2. erl_epmd:register_node(rabbit, 25672).
      1. Replace rabbit in the above statement with your hostname and the second parameter with the clustering port.
      2. This should return {ok, SomeNumber} if successful, and {error, SomeError} otherwise.
  11. Hopefully in your case the above worked.
  12. Now enter into the Erlang shell: without hitting return to execute
    1. halt().
  13. We now have registered Rabbit with epmd, but when our Erlang exits it will unregister.  Therefore, we need to do a little bit more work.
  14. With the other prompt still open and ready for return, start the RabbitMq command prompt.
  15. The following will spawn an Erlang process that waits 10s before registering RabbitMq with epmd.
  16. Run the following at the command line (again take into account the hostname and port for the erl_epmd:register function call:
    1. rabbitmqctl eval “spawn(fun()->timer:sleep(10000), erl_epmd:register_node(rabbit,25672)end).”
  17. Quickly hit enter in the Erlang shell (you have 10s due to the 10000 ms parameter).
  18. Now after 10s you should be able to log out, log back in and get rabbitmqctl to work.

Preventing it happening:

This  is a bug with RabbitMq on Windows and this will be fixed in a future release, the bug number for the release notes is: 26426.

Simon suggested the fix will be to stop rabbitmqctl or rabbitmq-plugins from starting epmd if it is not already running

Posted in Uncategorized | Leave a comment

Sql Server Mirroring and DB/Log File Growth

This post is going to cover a nasty surprise that hit me yesterday while doing some load testing/fail-over testing with Sql Server 2008R2.  This particular database had been set-up a long time ago with some surprisingly mad defaults.  The usual problem of a % based database/log file growth as opposed to a fixed size growth.

Anyway this particular database had for reasons lost to the sands of time been configured with an almost ridiculous growth.  This had been corrected on the principal server and I had assumed that this change was replicated to the mirror.

I was proved completely wrong in this assumption

The mirror continues to maintain its own separate database and log file growth settings, which while it is a mirror cannot be changed!

Therefore, I flipped the principal and mirror round let the load testing continue and went home.  The next day I found that the system and all tests had failed completely after a couple of hours.  Obviously now that the mirror was the principal and was using its own settings for database file growth, it had tried to extend the size of the database file to a size bigger than the free space on the drive.  Thereby causing mirroring to fail and obviously the application to fail completely.

In fact mirroring had to be re-initialised by copying a backup from the old-mirror to the old-principal and re-configuring mirroring.

The following code taken from http://www.handsonsqlserver.com/how-to-view-the-database-auto-growth-settings-and-correct-them/ will show, even on a mirror, the database growth settings:

-- auto growth settings for data and log files
select DB_NAME(mf.database_id) database_name
, mf.name logical_name
, CONVERT (DECIMAL (20,2) , (CONVERT(DECIMAL, size)/128)) [file_size_MB]
, CASE mf.is_percent_growth
WHEN 1 THEN 'Yes'
ELSE 'No'
END AS [is_percent_growth]
, CASE mf.is_percent_growth
WHEN 1 THEN CONVERT(VARCHAR, mf.growth) + '%'
WHEN 0 THEN CONVERT(VARCHAR, mf.growth/128) + ' MB'
END AS [growth_in_increment_of]
, CASE mf.is_percent_growth
WHEN 1 THEN CONVERT(DECIMAL(20,2), (((CONVERT(DECIMAL, size)*growth)/100)*8)/1024)
WHEN 0 THEN CONVERT(DECIMAL(20,2), (CONVERT(DECIMAL, growth)/128))
END AS [next_auto_growth_size_MB]
, CASE mf.max_size
WHEN 0 THEN 'No growth is allowed'
WHEN -1 THEN 'File will grow until the disk is full'
ELSE CONVERT(VARCHAR, mf.max_size)
END AS [max_size]
, physical_name
from sys.master_files mf

The site mentioned above also contains the command on how to modify through scripts the growth settings for a particular database.

From my testing this seemed impossible when the database was being used as a mirror.

Conclusion

In conclusion, be sure that you set both the principal up correctly before you take the backup to restore on the mirror.

If you find out that the growth settings need changed, then you’ll need to replicate this on the mirror.  Most likely during some downtime you will flip who is principal, apply the changes to what was the mirror, and then flip the principal back to the original server.

Posted in Sql, Sql Server | Leave a comment

FIxing iMovie broken projects

So I had been editing for my father’s 60th birthday an iMovie project that contained some photos from his life and videos.  I had been working on this on and off for approximately two months.  Like an idiot I had obviously not been backing up the project and should receive a slap for that.

Last night I added some further video to the middle of the project and everything was working as normal.  I then adjusted this video to slow it down further.

After making this adjustment I noticed that my preview video display window was greyed out and did not show any of the actual video.  Selecting anywhere on the timeline and hitting play did not work.

Panic ensures!  I closed iMovie and re-opened.  Now the edit Project option was disabled for that particular project but enabled for the other projects.  Further panic.

I tried a reboot and that did nothing.

I also downloaded the trial of Final Cut Pro X so that I could attempt to use it. I didn’t really have the time to become completely familiar with it. However, it was able to successfully open and import the project.

In the end I tracked down a post online (reference forgotten sorry :<) that mentions that one video might be corrupting the whole project. I remembered that I had just added a particular video so I fired up iMovie and deleted it. Then crossed my fingers, ate some lucky rabbits feet and horseshoes.

Anyway this actually fixed the project and I was able to use it again!

Posted in Uncategorized | Tagged , , | Leave a comment

Sql Server Include Indexes

While researching another issue today I came across a feature of Sql Server that I had not noticed or heard of before: Included columns in non-clustered indexing.

Now before we discuss what this is, lets examine how clustered indexing works in Sql Server, in relation to how the actual fields are stored.

For a clustered index in Sql Server, which is a B-tree (http://msdn.microsoft.com/en-us/library/ms177443(v=sql.105).aspx) the leaf contains the actual data for the row.

So whenever Sql Server arrives at the leaf node it does not have to do any further lookups to get the data that you want in your query.

Now if you use a non-clustered index in your query then Sql Server will still have to perform the lookup to the clustered index to get the actual data, in the case that the columns you requested are not covered by the index.

Lets have a look at an example, first the table in question.

CREATE TABLE [dbo].[testing](
[id] [int] NOT NULL,
[name] [varchar](25) NOT NULL,
[phone] [varchar](25) NOT NULL,
[gender] [varchar](10) NOT NULL,
CONSTRAINT [PK_testing] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Then let us add an index to that table.

CREATE INDEX IX_Testing_Name ON testing
(
name ASC
)

So if we were to now run a query:

SELECT id, name FROM testing
WHERE name LIKE 'Steve%'

This query would use the index IX_Testing_Name.  However, the name and id fields would be available in the leaf of the index and therefore no further lookups would be required.

In our simplified example it would use this index but obviously this would depend on the Query Optimiser on a real server.

So if we now run the query:

SELECT id, name, phone
FROM Testing
WHERE name LIKE '%ScrivemasterFlash%'

In this case the phone column is not available in the non-clustered index and Sql Server must use the row locator, which for this clustered table will be the id, to lookup the clustered index leaf node and fetch the phone column.

If we wanted to avoid this and keep a column in the index leaf node, but not index on the column! Then we could use Sql Server’s Indexes with included columns. This would save space by not including this column as part of the index.

CREATE INDEX IX_Testing_Name ON testing
(
name ASC
) INCLUDE (phone)

 
Now if we repeat our previous query:

SELECT id, name, phone
FROM Testing
WHERE name LIKE '%ScrivemasterFlash%'

 

Sql Server is able to use our included column and does not need to lookup the clustered index leaf node to return the results.

For more information, look here at MSDN:

Indexes with Included Columns
Nonclustered Index Structures
Clustered Index Structures

Posted in Sql, Sql Server | Leave a comment

Spring configuration and enums

Had some trouble today with Spring when constructing some beans that were enums.  This had previously been working but stopped working for some bizarre reason.

The previous  definition had been long winded:

<bean id="blue" class="com.demo.EnumColours" factory-method="valueOf">
<constructor-arg>
<value>BLUE</value>
</constructor-arg>
</bean>

Which I was able to replace with the following, which Spring seemed happy with.

<util:constant id="blue"  static-field="com.demo.EnumColours.BLUE" />

Obviously this does not make sense as this project was working for a number of weeks before I renamed the Spring configuration xml file and had this error.

Posted in Uncategorized | Tagged , , , | Leave a comment

Tomcat 7 static file caching

I was working with a Javascript developer today and he complained that his updates to his static files were not being replicated when he did a HTTP GET from his browser.

It seems that Tomcat 7 has a cachingAllowed parameter that can be set in the context.xml.  I set this to false (it defaults to true) and restarted the web application.  See the documentation here:

http://tomcat.apache.org/tomcat-7.0-doc/config/context.html

This made no difference and the same problem occurred again.  Investigating further I found the following link: http://serverfault.com/questions/40205/how-do-i-disable-tomcat-caching-im-having-weird-static-file-problems+&cd=10&hl=en&ct=clnk&gl=uk

This made clear that is the parameter: antiResourceLocking is set to true then the static content will be cached anyway.

Setting this parameter to false fixed the problem so that I could get on with some real work and the javascript developer could do some quick testing.

Posted in Uncategorized | Tagged , | Leave a comment

Hibernate mapping files DTD problem

So I have been working on splitting up a monolithic older project for my company.  I was ensuring that we had no warnings in our logs when our freshly Mavenised web application starts up in Tomcat 7.0.32.

One of the warnings I found and was trying to remove was:

WARN – recognized obsolete hibernate namespace http://hibernate.sourceforge.net/. Use namespace http://www.hibernate.org/dtd/ instead. Refer to Hibernate 3.6 Migration Guide!

Ok so I had a look at our HBM mapping files and they included the DTD link to

<!DOCTYPE hibernate-mapping PUBLIC “http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd”&gt;

A HTTP call to this address gets a 301 to http://www.hibernate.org/dtd/hibernate-mapping-3.0.dtd

which then gets a 301 to http://www.jboss.org/dtd/hibernate/hibernate-mapping-3.0.dtd.

Ok great so I updated all our files to use http://www.jboss.org and I thought that would be the end of it.

Now I then noticed that our web application was taking longer to start.  As I was testing within Eclipse I had not adjusted the default Tomcat 7 start time allowance of 45 seconds.  This had previously been enough time but no longer!

I look at the logs showed that Hibernate was taking an extra 25s to get up and running.  Strange indeed.

I started the application under conditions where there was no network connection and it failed to start at all due to not being able to contact http://www.jboss.org.

After some old fashioned Googling for the issue I discovered that the dtd files should be available in the Hibernate.jar file.

The version of Hibernate (3.6.2) we were using was older but we had no need at that time to upgrade and had a very tight schedule and small explicit set of requirements to meet.

Examining the DTD inside the Hibernate.jar (/org/hibernate/hibernate-mapping.dtd) I see that it defines itself as pointing at http://www.hibernate.org, most likely because it is a much older version of Hibernate.

Therefore, adjusting our XML files to point to http://www.hibernate.org fixed:

  • Warning message in the logs.
  • Slow start up time
  • Startup without a network connection.
Posted in Hibernate, Java | Leave a comment