onsdag den 22. januar 2014

Upgrading from SolrCloud 4.x to 4.y - as if you had used 4.y all along

Introduction

 

I have recently been upgrading one of my systems that use Solr in "cloud"-mode - that is, using ZooKeeper to make several Solr-nodes work together in a cluster. I upgraded from Solr 4.0.0 to 4.4.0, but this blog will also be relevant for several other combinations of from- and to-version - you just need to filter what is relevant for you.

Basically you can just stop your Solr-nodes, copy the new binaries and start Solr-nodes again, but
  • If you upgraded from 4.0.0 to a newer version, routing will not work the same anymore
  • If you, like me, like the state/configuration of the system after the upgrade, to be as it would have been if you had used 4.4.0 all along, there are several differences you need to correct 

Differences

 

Let me go through the differences I encountered between a 4.0.0-to-4.4.0 upgraded system and a clean 4.4.0 system

clusterstate.json

clusterstate.json lives in the root (of Solr-area) in ZooKeeper. It contains information about collection, shards and replica.

Before the upgrade my clusterstate.json looks like this (Solr 4.0.0)
 
{
  "my_collection_1":{
    "shard1":{
      "range":"80000000-8ccccccc",
      "replicas":{"my_host_1:8983_solr_my_collection_1_shard1_replica1":{
          "shard":"shard1",
          "roles":null,
          "state":"active",
          "core":"my_collection_1_shard1_replica1",
          "collection":"my_collection_1",
          "node_name":"my_host_1:8983_solr",
          "base_url":"http://my_host_1:8983/solr",
          "leader":"true"}}},
    "shard2":
    ... 19 more shards under "my_collection_1" ...
  },
  "my_collection_2":
  ... 23 more collections ...
}

If I had used Solr 4.4.0 all along my clusterstate.json would have looked like this
 
{
  "my_collection_1":{
    "shards":{
      "shard1":{
        "range":"80000000-8ccccccc",
        "state":"active",
        "replicas":{"core_node1":{
            "state":"active",
            "core":"my_collection_1_shard1_replica1",
            "node_name":"192.168.xxx.yyy:8983_solr",
            "base_url":"http://192.168.xxx.yyy:8983/solr",
            "leader":"true"}}},
      "shard2":
      ... 19 more shards under "my_collection_1" ...
    },
    "router":"compositeId"
  },
  "my_collection_2":
  ... 23 more collections ...
}

Differences
  1. All shards at collection-level has been wrapped inside a shards-map in 4.4.0
  2. This is probably to make room for other key-values at collection-level. router=compositeId has been added at this level in 4.4.0
  3. state=active has been added at shard-level in 4.4.0
  4. Replica-keys/names have changed from something on the form <hostname>:<port>_<context>_<shard-name>_replica<X> to just core_node<Y>
  5. shard, roles and collection no longer present at replica-level in 4.4.0
  6. node_name and base_url are now based on IP instead of hostname
Now if I just stop my 4.0.0 Solr-nodes, do the upgrade (copy the new Solr 4.4.0 binaries) and start Solr-nodes again, clusterstate.json will automatically be changed and look like this
 
{
  "my_collection_1":{"shards":{
      "shard1":{
        "range":"80000000-8ccccccc",
        "replicas":{"my_host_1:8983_solr_my_collection_1_shard1_replica1":{
            "state":"active",
            "core":"my_collection_1_shard1_replica1",
            "node_name":"my_host_1:8983_solr",
            "base_url":"http://my_host_1:8983/solr",
            "leader":"true"}},
        "state":"active"},
      "shard2":
      ... 19 more shards under "my_collection_1" ...
  }},
  "my_collection_2":
  ... 23 more collections ...
}

Which of the differences between 4.0.0 and 4.4.0, was automatically "corrected" by 4.4.0 started on top of a system that used to run 4.0.0
  1. All shards at collection-level has been wrapped inside a shards-map. Check!
  2. router=compositeId has not been added at collection-level :-(
  3. state=active has been added at shard-level. Check!
  4. Replica-keys/names have not been changed :-(
  5. shard, roles and collection have been removed from replica-level. Check!
  6. node_name and base_url are not based on IP :-(
solr.xml files

In 4.0.0 my solr.xml files contain <cores>-tag with lots of <core>-tags underneath. This still works after upgrading to 4.4.0, but you will be running in "legacy mode", which will not be supported from Solr 5.x

I would like my solr.xml files in my 4.0.0-to-4.4.0 upgraded system to be as they would have been if I had run 4.4.0 all along. Appendix 1) below shows my 4.4.0 solr.xml

core.properties files

In Solr 4.0.0 there are no core.properties files in <solr-home>/<replica-name> on disk (<solr-home> is controlled by VM-param -Dsolr.solr.home given when you start your Solr web-container (e.g. Jetty))

In Solr 4.4.0 there are a core.properties file for each replica in <solr-home>/<replica-name>. For my_collection_1 | shard1 | core_node1 it contains the following

 
name=my_collection_1_shard1_replica1
shard=shard1
collection=my_collection_1
coreNodeName=core_node1
I would like to have core.properties files in my 4.0.0-to-4.4.0 upgraded system, just as I would have if I had run 4.4.0 all along

Data in collection znode's

For each collection there exist a znode (folder) in ZooKeeper at /collections/<collection-name> (in Solr-area). As you probably know, a znode can contain data, even though it is a folder (contains "children")

In 4.0.0 the data of those collection-znodes is
 
{"configName":"my_conf"}

In 4.4.0 the data is
 
{"configName":"my_conf", "router":"implicit"}

I would like that also in my 4.0.0-to.4.4.0 upgraded system.

How I corrected the differences

 

Now that we have seen all the differences I encountered, lets look at what I did to "correct" them, in order for my 4.0.0-to-4.4.0 upgraded system to seem as if it had been 4.4.0 all along

  1. Make sure ZooKeeper is running, but that Solr 4.0.0 nodes are not
  2. Extract/download (using your favorite tool) clusterstate.json from ZooKeeper (root of Solr-area) in a folder <my-favorite-folder>/upgrade/before on the machine from which you do the upgrade
  3. Correct ranges in clusterstate.json as explained here (only necessary if you upgrade from 4.0.0)
  4.  
    java -classpath .:${SOLR_4_0_0_INSTALL}/dist/apache-solr-solrj-4.0.0.jar:${SOLR_4_0_0_INSTALL}/dist/solrj-lib/zookeeper-3.3.6.jar:${SOLR_4_0_0_INSTALL}/dist/solrj-lib/commons-io-2.1.jar:${SOLR_4_0_0_INSTALL}/dist/solrj-lib/slf4j-api-1.6.4.jar CorrectShardRangesInClusterState <my-favorite-folder>/upgrade/before/clusterstate.json <my-favorite-folder>/upgrade/after_ranges_fix/clusterstate.json
  5. Compile ClusterState4_0ToClusterStateAndCoreProperties4_4Upgrader.java from Appendix 2) below against 4.4.0 Solr code. You need to implement method hostnameToIP yourself first
  6. Convert clusterstate.json from 4.0.0-style to 4.4.0-style and generate all core.properties files
  7.  
    java -classpath .:${SOLR_4_4_0_INSTALL}/dist/apache-solr-solrj-4.4.0.jar:${SOLR_4_4_0_INSTALL}/dist/solrj-lib/commons-io-2.1.jar ClusterState4_0ToClusterStateAndCoreProperties4_4Upgrader <my-favorite-folder>/upgrade/after_ranges_fix <my-favorite-folder>/upgrade/after
  8. By now you have your new configuration files in <my-favorite-folder>/upgrade/after
    • clusterstate.json
    • <IP>/data/<replica-name>/core.properties (a <IP> for each Solr-node in your system, and a <replica-name> for each replica run by that Solr-node
  9. Upload (using your favorite tool) <my-favorite-folder>/upgrade/after/clusterstate.json to ZooKeeper (root of Solr-area) replacing the existing one
  10. Upload all core.properties files (bash example)
  11.  
    for IP in <IP#1> <IP#2> ... <IP#N>; do  # mention the IP's of all your Solr-nodes
     scp -r <my-favorite-folder>/upgrade/after/${IP}/data/. <solr-node-user>@${IP}:<solr-home>
    done
    
  12. Compile SolrConfigDirInZookeeperUpgrader.java from Appendix 3) below against 4.4.0 Solr code
  13. Modify data in all collection/<collection-name> in ZooKeeper (in Solr-area)
  14.  
    java -classpath .:${SOLR_4_4_0_INSTALL}/dist/apache-solr-solrj-4.4.0.jar:${SOLR_4_4_0_INSTALL}/dist/solrj-lib/zookeeper-3.4.5.jar SolrConfigDirInZookeeperUpgrader <solr-zookeeper-connection-string> <name-of-your-solr-configuration>
  15. Now install the 4.4.0 binaries on all Solr-nodes (replacing existing 4.0.0 binaries) and start them again - viola!

Disclaimer

 

No warranty. Test it thoroughly before you do it in production!

I have successfully followed the sketched procedure and upgraded a 4.0.0 system, so that it seems as if it had been running Solr 4.4.0 all along

Appendix

 

1)  My 4.4.0 solr.xml
 
<solr>
  <str name="sharedLib">${sharedLib:}</str>
  
  <solrcloud>
    <str name="host">${host:}</str>
    <int name="hostPort">${jetty.port:8983}</int>
    <str name="hostContext">${hostContext:solr}</str>
    <int name="zkClientTimeout">${zkClientTimeout:30000}</int>
    <bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
  </solrcloud>

  <shardHandlerFactory name="shardHandlerFactory"
    class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:0}</int>
    <int name="connTimeout">${connTimeout:0}</int>
  </shardHandlerFactory>

</solr>

2)  ClusterState4_0ToClusterStateAndCoreProperties4_4Upgrader.java
 
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.io.File;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import org.apache.commons.io.FileUtils;
import org.apache.solr.common.cloud.ZkStateReader;

public class ClusterState4_0ToClusterStateAndCoreProperties4_4Upgrader {

 /*
  * Assuming that the environment is set up with topology
  */
 public static void main(String[] args) throws Exception {
  String inputFolder = args[0];
  String outputFolder = args[1];

  byte[] bytes = FileUtils.readFileToByteArray(new File(inputFolder + File.separator + "clusterstate.json"));
  Map<String, Object> stateMap = (Map<String, Object>) ZkStateReader.fromJSON(bytes);
  for (Entry<String, Object> collectionEntry : stateMap.entrySet()) {
   int nextCoreNodeNameNumber=1;
   Object collectionEntryValue = collectionEntry.getValue();
   if (collectionEntryValue instanceof Map<?, ?>) {
    for (Entry<?, ?> collectionMapEntry : ((Map<?, ?>)collectionEntryValue).entrySet()) {
     if (collectionMapEntry.getKey().toString().startsWith("shard")) {
      Object shardEntryValue = collectionMapEntry.getValue();
      if (shardEntryValue instanceof Map<?, ?>) {
       Map<String, Object> shardEntryMap = (Map<String, Object>)shardEntryValue;
       shardEntryMap.put("state", "active");
       Object replicasEntryValue = shardEntryMap.get("replicas");
       if (replicasEntryValue instanceof Map<?, ?>) {
        Map<String, Object> replicasEntryMap = (Map<String, Object>)replicasEntryValue;
        List<Object> replicasEntryMapValues = new ArrayList<Object>(replicasEntryMap.values());
        replicasEntryMap.clear();
        for (Object replicasEntryMapValue : replicasEntryMapValues) {
         String coreNodeName="core_node" + (nextCoreNodeNameNumber++);
         replicasEntryMap.put(coreNodeName, replicasEntryMapValue);
        }
        for (Entry<String, Object> replicasEntryMapEntry : replicasEntryMap.entrySet()) {
         Object replicasEntryMapEntryValue = replicasEntryMapEntry.getValue(); 
         if (replicasEntryMapEntryValue instanceof Map<?, ?>) {
          Map<String, Object> replicaMap = (Map<String, Object>)replicasEntryMapEntryValue;
          String nodeName = replicaMap.get("node_name").toString();
          String hostname = nodeName.substring(0, nodeName.indexOf(':'));
          String IP = hostnameToIP(hostname);
          String core = replicaMap.get("core").toString();
          String shard = replicaMap.get("shard").toString();
          String collection = replicaMap.get("collection").toString();
          File corePropertiesDir = new File(outputFolder + File.separator + IP + File.separator + "data" + File.separator + core);
          corePropertiesDir.mkdirs();
          File corePropertiesFile = new File(corePropertiesDir, "core.properties");
          corePropertiesFile.createNewFile();
          PrintWriter corePropertiesFileWriter = new PrintWriter(corePropertiesFile);
          try {
           corePropertiesFileWriter.println("name=" + core);
           corePropertiesFileWriter.println("shard=" + shard);
           corePropertiesFileWriter.println("collection=" + collection);
           corePropertiesFileWriter.println("coreNodeName=" + replicasEntryMapEntry.getKey());
          } finally {
           corePropertiesFileWriter.flush();
           corePropertiesFileWriter.close();
          }
          replicaMap.remove("roles");
          replicaMap.remove("shard");
          replicaMap.remove("collection");
          replicaMap.put("state", "down");
          replicaMap.put("node_name", replicaMap.get("node_name").toString().replace(hostname, IP));
          replicaMap.put("base_url", replicaMap.get("base_url").toString().replace(hostname, IP));
         }
        }
       }
      }
     }
    }
    Map<String, Object> shardEntries = new LinkedHashMap<String, Object>();
    for (Entry<?, ?> collectionMapEntry : ((Map<?, ?>)collectionEntryValue).entrySet()) {
     if (collectionMapEntry.getKey().toString().startsWith("shard")) {
      shardEntries.put(collectionMapEntry.getKey().toString(), collectionMapEntry.getValue());
     }
    }
    for (Entry<String, Object> shardEntry : shardEntries.entrySet()) {
     ((Map<?, ?>)collectionEntryValue).remove(shardEntry.getKey());
    }
    ((Map<String, Object>)collectionEntryValue).put("shards", shardEntries);
    ((Map<String, Object>)collectionEntryValue).put("router", "compositeId");
   }
  }

  bytes = ZkStateReader.toJSON(stateMap);
  FileUtils.writeByteArrayToFile(new File(outputFolder + File.separator + "clusterstate.json"), bytes);
  System.exit(0);
 }
 
 protected static String hostnameToIP(String hostname) {
  // TODO calculate and return the IP corresponding to hostname
 }
}

3)  SolrConfigDirInZookeeperUpgrader.java
 
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.security.NoSuchAlgorithmException;
import java.util.List;

import org.apache.solr.common.cloud.SolrZkClient;
import org.apache.solr.common.cloud.SolrZooKeeper;
import org.apache.zookeeper.KeeperException;

public class SolrConfigDirInZookeeperUpgrader
{
    public static final String SOLR_COLLECTION_NODE = "/collections";
    public static final String SOLR_CONFIG_PREFIX = 
      "{\n" + 
      "  \"configName\":\"";
    public static final String SOLR_CONFIG_POSTFIX = 
      "\",\n" + 
      "  \"router\":\"implicit\"}";
    
    public static void main(String[] args) throws KeeperException, InterruptedException, NoSuchAlgorithmException 
    {
        SolrConfigDirInZookeeperUpgrader instance = new SolrConfigDirInZookeeperUpgrader(args[0], args[1]);
        instance.updateSolrConfNameForAllCollections();
    }

    private final String solrZkConnectionStr;
    private final String confName;
    
 public SolrConfigDirInZookeeperUpgrader(String solrZkConnectionStr, String confName) {
  super();
  this.solrZkConnectionStr = solrZkConnectionStr;
  this.confName = confName;
 }

 private void updateSolrConfNameForAllCollections() throws KeeperException, InterruptedException
 {
        final SolrZkClient client = new SolrZkClient(solrZkConnectionStr, 16000);
        try {
         SolrZooKeeper zk = client.getSolrZooKeeper();
         List children = zk.getChildren(SOLR_COLLECTION_NODE, null);
         
         for(String child: children) {
          updateData(zk, SOLR_COLLECTION_NODE + "/" + child);
         }
        } finally {
         client.close();
        }
 }
 
 private void updateData(SolrZooKeeper zk, String node) throws KeeperException, InterruptedException {
  zk.setData(node, new String(SOLR_CONFIG_PREFIX + confName + SOLR_CONFIG_POSTFIX).getBytes(), -1);
 }

}

Ingen kommentarer:

Send en kommentar