Nextcloud PostgreSQL数据损坏报错 ERROR: MultiXactId xxxxxxxxx has not been created yet — apparent wraparound 处理

发布于 2023-06-19  598 次阅读


postgres,怎么老是你?

数据库又炸了,执行语句时错误信息为ERROR: MultiXactId xxxxxxxxx has not been created yet -- apparent wraparound,执行VACUUM FULL的错误信息为WARNING: concurrent delete in progress within table "oc_filecache"

尝试REINDEX table oc_filecache,错误信息为:

WARNING:  concurrent delete in progress within table "oc_filecache"
ERROR:  could not access status of transaction 1627717632
DETAIL:  Could not open file "pg_subtrans/6105": No such file or directory.
CONTEXT:  while checking uniqueness of tuple (775,1) in relation "oc_filecache"

CONTEXT中提供了ctid,通过select * from oc_filecache where ctid='(775,1)';检查该行,返回的结果是一个空行。

试图运行delete from xxxx where ctid = '(775,1)';删除该行,返回与REINDEX同样的错误。

fsck -n /dev/sda1发现filesystem有问题,进入rescue system运行fsck -y /dev/sda1修复。

重启后错误依旧存在,运行SELECT * FROM pg_stat_activity WHERE state = 'active';查看正在运行的query,发现没有任何query。

直接drop table之后VACUUM FULL成功,但损失数据太多了,根据这个post重建了oc_filecache模板也没有用。

使用的命令为:

DROP TABLE IF EXISTS oc_filecache;

CREATE TABLE public.oc_filecache (
    fileid bigint NOT NULL,
    storage bigint DEFAULT 0 NOT NULL,
    path character varying(4000) DEFAULT NULL::character varying,
    path_hash character varying(32) DEFAULT ''::character varying NOT NULL,
    parent bigint DEFAULT 0 NOT NULL,
    name character varying(250) DEFAULT NULL::character varying,
    mimetype bigint DEFAULT 0 NOT NULL,
    mimepart bigint DEFAULT 0 NOT NULL,
    size bigint DEFAULT 0 NOT NULL,
    mtime bigint DEFAULT 0 NOT NULL,
    storage_mtime bigint DEFAULT 0 NOT NULL,
    encrypted integer DEFAULT 0 NOT NULL,
    unencrypted_size bigint DEFAULT 0 NOT NULL,
    etag character varying(40) DEFAULT NULL::character varying,
    permissions integer DEFAULT 0,
    checksum character varying(255) DEFAULT NULL::character varying
);

ALTER TABLE public.oc_filecache OWNER TO <<nextcloud_user>>;

CREATE SEQUENCE public.oc_filecache_fileid_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;

ALTER TABLE public.oc_filecache_fileid_seq OWNER TO <<nextcloud_user>>;

ALTER TABLE ONLY public.oc_filecache ALTER COLUMN fileid SET DEFAULT nextval('public.oc_filecache_fileid_seq'::regclass);

这里的<<nextcloud_user>>需要改成nextcloud对应的oc_xxxx,否则occ会有权限错误,相关讨论在Getting started with occ and getting stuck with db permissions

之后找了个之前的备份,导出了oc_filecache这个表,然后导入到现在的数据库中。导入完成后使用docker exec -u www-data nextcloud php /var/www/html/occ files:scan --all重建oc_filecache。网页上Files tab是空白的,F12之后console有大量TypeError: OCA.Files.FileList is undefined错误。

scan过程中出现错误nextcloud Exception during scan: "" is locked, existing lock on file: exclusive,使用SELECT * FROM oc_file_locks;以及docker exec -ti nextcloud-cache redis-cli -a xxxxxxxx flushall后重试。

重试后在运行四小时左右时报错deadlock detected,似乎不是大问题,以防万一运行了occ files:cleanup以及根据settings界面的提示运行了occ db:add-missing-indices,之后重新occ files:scan --all

一天之后报错PHP Fatal error, Allowed memory size of 4294967296 bytes exhausted,查阅文档后发现需要运行occ maintenance:data-fingerprint,之后运行occ -v files:scan --all去除memory limit并加上verbose重试。

扫描完成后发现网页上Files tab依旧是空白的,尝试更新到25.0.7-fpm,更新完成后就正常了,怀疑是更新的过程中的某个自动的操作解决了问题。


Sup