티스토리 뷰

Kafka & Elasticsearch

Kafka Fail-over (cluster)

이주성 2019. 1. 7. 18:53
cluster

참고 사이트

filebeat

kafka cluster 3개 node중에 1개를 stop시켰는데, filebeat가 produce를 못한다. leaderless란다.

2019-01-04T20:31:18.060+0900    INFO    kafka/log.go:53 Connected to broker at KAFKA-01:9092 (unregistered)
2019-01-04T20:31:18.067+0900    INFO    kafka/log.go:53 client/brokers registered new broker #1 at KAFKA-01:9092
2019-01-04T20:31:18.067+0900    INFO    kafka/log.go:53 kafka message: client/metadata found some partitions to be leaderless
2019-01-04T20:31:18.067+0900    INFO    kafka/log.go:53 client/metadata retrying after 250ms... (2 attempt remaining)
2019-01-04T20:31:18.318+0900    INFO    kafka/log.go:53 client/metadata fetching metadata for [topic-ui5.0-action] from broker KAFKA-01:9092
2019-01-04T20:31:18.321+0900    INFO    kafka/log.go:53 kafka message: client/metadata found some partitions to be leaderless
2019-01-04T20:31:18.321+0900    INFO    kafka/log.go:53 client/metadata retrying after 250ms... (1 attempts remaining)

혹시나싶어서 topic describe를 해보니, leader=-1처럼 에러가 뜬다. leader=-1이 안뜨게 하려면 아래와 같이 replication-factor를 node갯수만큰 줘야 한다.

Topic:topic-ui4.0-anonym    PartitionCount:1    ReplicationFactor:1 Configs:
    Topic: topic-ui5.0-action   Partition: 0    Leader: -1   Replicas: 2 Isr: 2

replication-factor=3을 주고, kafka node=1를 죽이면, 아래와 같은 결과가 나온다.

Topic:topic-ui5.0-action    PartitionCount:3    ReplicationFactor:3 Configs:
    Topic: topic-ui5.0-action   Partition: 0    Leader: 2   Replicas: 1,2,3 Isr: 1,3,2
    Topic: topic-ui5.0-action   Partition: 1    Leader: 2   Replicas: 2,3,1 Isr: 1,3,2
    Topic: topic-ui5.0-action   Partition: 2    Leader: 3   Replicas: 3,1,2 Isr: 1,3,2

Consumer, Streams

Consumer와 Streams가 Kafka 3대중에 1대만 kill시켜도 이상하게, 작동안하는 경우가 있다. 어떨때는 될때도 있다 ㅠㅜ. 왔다리 갔다리 할때가 제일 힘들다 그래서, Kafka server.log를 보니 아래와 같은 에러로그가 있다.

DEBUG [MetadataCache brokerId=1] Error while fetching metadata for __consumer_offsets-1: listener ListenerName(SASL_PLAINTEXT) not found on leader -1 (kafka.server.MetadataCache)
DEBUG [MetadataCache brokerId=1] Error while fetching metadata for __consumer_offsets-19: listener ListenerName(SASL_PLAINTEXT) not found on leader -1 (kafka.server.MetadataCache)
DEBUG [MetadataCache brokerId=1] Error while fetching metadata for __consumer_offsets-28: listener ListenerName(SASL_PLAINTEXT) not found on leader -1 (kafka.server.MetadataCache)

토픽 __consumer_offset이 뭘까? kafka-topic --describe해보자. 이상하다 내가 싫어하는 leader=-1이 보인다. 그리고, replication-factor가 1이다. 1이면 fail-over가 안될텐데. 또한 server.properties의 offsets.topic.replication.factor=3 설정을 3을 줬는데도, replicationFactor=1로 되어 있다. 버그다

$ kafka-topics --zookeeper localhost:2181 --describe --topic __consumer_offsets
Topic:__consumer_offsets    PartitionCount:50   ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
    Topic: __consumer_offsets   Partition: 0    Leader: 1   Replicas: 1 Isr: 1
    Topic: __consumer_offsets   Partition: 1    Leader: -1  Replicas: 2 Isr: 2
    Topic: __consumer_offsets   Partition: 2    Leader: 3   Replicas: 3 Isr: 3
    Topic: __consumer_offsets   Partition: 3    Leader: 1   Replicas: 1 Isr: 1
    Topic: __consumer_offsets   Partition: 4    Leader: -1  Replicas: 2 Isr: 2
    Topic: __consumer_offsets   Partition: 5    Leader: 3   Replicas: 3 Isr: 3
    Topic: __consumer_offsets   Partition: 6    Leader: 1   Replicas: 1 Isr: 1
# consumer-offsets-replication-factor.json
{"version":1,
 "partitions":[
   {"topic":"__consumer_offsets", "partition":0,  "replicas":[0, 1, 2]},
   {"topic":"__consumer_offsets", "partition":1,  "replicas":[0, 1, 2]},
   {"topic":"__consumer_offsets", "partition":2,  "replicas":[1, 2, 3]},
   {"topic":"__consumer_offsets", "partition":3,  "replicas":[0, 1, 2]},
    ... 
   {"topic":"__consumer_offsets", "partition":3,  "replicas":[3, 2, 1]} # 50개 
 $ kafka-reassign-partitions --zookeeper localhost:2181  --reassignment-json-file ./consumer-offsets-replication-factor.json --execute
$ kafka-topics --zookeeper localhost:2181 --describe --topic __consumer_offsets
Topic:__consumer_offsets    PartitionCount:50   ReplicationFactor:3 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
    Topic: __consumer_offsets   Partition: 0    Leader: 1   Replicas: 1,2,3 Isr: 1,3,2
    Topic: __consumer_offsets   Partition: 1    Leader: 2   Replicas: 1,2,3 Isr: 2,3,1
    Topic: __consumer_offsets   Partition: 2    Leader: 3   Replicas: 1,2,3 Isr: 3,1,2
    Topic: __consumer_offsets   Partition: 3    Leader: 1   Replicas: 1,2,3 Isr: 1,3,2
    Topic: __consumer_offsets   Partition: 4    Leader: 2   Replicas: 1,2,3 Isr: 2,3,1
    Topic: __consumer_offsets   Partition: 5    Leader: 3   Replicas: 1,2,3 Isr: 3,1,2
    ... # 50개 

Connector

Kafka cluster중에 1개 node를 stop시켰는데, connector가 consuming을 못한다. 또 connect-offset-13등이 leader가 없단다.

WARN [Consumer clientId=consumer-1, groupId=connect-cluster] 8 partitions have leader brokers without a matching listener, including [connect-offsets-13, connect-offsets-4, connect-offsets-22, connect-offsets-16, connect-offsets-7, connect-offsets-10, connect-offsets-1, connect-offsets-19] (org.apache.kafka.clients.NetworkClient:961)

그래서, connector가 만든 topic들 ***connect.offset, connect.status, connect.configs***를 describe해보니, 또 replication-factor=1이란다.

Topic:connect.configs PartitionCount:1    ReplicationFactor:1 Configs:
    Topic: connect.configs   Partition: 0    Leader: -1   Replicas: 2 Isr: 2

문서를 찾아보니, 아래와 같이 replication-factor=3을 수동으로 줘야 한단다. 물론 아래와 같이 하려면, topic delete를 한후 connector-register도 다시 해줘야한다. http://docs.confluent.io 참고문서

  bin/kafka-topics --create --zookeeper localhost:2181 --topic connect-configs --replication-factor 3 --partitions 1 --config cleanup.policy=compact
  bin/kafka-topics --create --zookeeper localhost:2181 --topic connect-offsets --replication-factor 3 --partitions 50 --config cleanup.policy=compact
  bin/kafka-topics --create --zookeeper localhost:2181 --topic connect-status --replication-factor 3 --partitions 10 --config cleanup.policy=compact

중요!! connctor관련 Topic들이 replication-factor=3으로 설정해도, 나중에 1로 바뀌는 경우가 있다. 추측컨데 __consumer_offsets토픽의 replication-factor=3으로 먼저 설정해야 할듯하다

Topic 설정 변경

partition 변경

kafka-topics --alter --zookeeper ZOOKEEPER-01 --topic topic-ui5.0-action-json --partitions 3

replication-factor변경

replicas의 앞을 1,2,3으로 다르게 주는 것이 중요하다. replicas[0]=1이면 다음 Election시 leader=1이 되고, 3이면 다음선거시 leader=3이 된다.

{"version":1,
  "partitions":[
     {"topic":"topic-ui5.0-action","partition":0,"replicas":[1,2,3]},
     {"topic":"topic-ui5.0-action","partition":1,"replicas":[2,3,1]},
     {"topic":"topic-ui5.0-action","partition":2,"replicas":[3,2,1]}
]}
kafka-reassign-partitions --zookeeper ZOOKEEPER-01 --reassignment-json-file increase-replication-factor.json --executehis

'Kafka & Elasticsearch' 카테고리의 다른 글

Text and Keyword 타입  (0) 2020.04.29
Pandas  (0) 2020.03.05
Elasticsearch scaling down  (0) 2019.08.04
Kafka Streams (Stateful, Aggregating)  (0) 2019.03.17
토픽 삭제 (Topic delete)  (0) 2019.01.06
댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2025/01   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함